Shuffling an Array, Not as Trivial as It Sounds
Do you trust the web? (OK, that’s click bait… but let me explain my experience.) Some time ago, I got a peculiar requirement for a change in an API I had developed. Basically, the involved endpoint collected and collated information from services and databases, to produce a certain set of results. The request was that the result set had to be in random order, meaning that repeated calls to the endpoint would produce the same data (assuming the other services and databases didn’t change) but every time in a random sequence. Furthermore, it was desired that all possible shuffled sequences should be equally probable, because of some statistics-related needs. So, I already knew how to produce the data set, but I now needed to be able to put it in disorder before sending it back to the client.
In this article we’ll see how I implemented my “un-sorting” shuffling function, and how I tested it (by applying higher-order functions) to be reasonably certain that I hadn’t messed up. And, to get back to my comment about trusting the web, we’ll see how the most often quoted (and widely available on the web) shuffling algorithm, is totally wrong!
Testing my Shuffling Algorithm
How can you test a shuffling algorithm? If you apply it to a given array, you can check if the output array has the same values as the input array: that’s a basic requirement. But how do you test if it produces multiple random orderings, and with equal probabilities? (By the way, this is called a uniform distribution implying that all possible results have the same uniform probability.) The real way of checking it would be applying a Chi-squared goodness of fit test: basically, you’d have to generate many sequences and do some math to see whether the produced distribution is uniform or not.
I could have done that, but I opted for a more empirical way of working. I decided to implement a higher-order function that would take a shuffling function as an input, call it a good number of times, record the results, and then produce a chart to easily see how uniform were the results. For instance, doing 24,000 runs to shuffle a 4-element array should produce something like the following.
A-B-C-D: 983 ###############################################
A-B-D-C: 1026 #################################################
A-C-B-D: 977 ##############################################
A-C-D-B: 993 ###############################################
A-D-B-C: 983 ###############################################
A-D-C-B: 984 ###############################################
B-A-C-D: 1028 #################################################
B-A-D-C: 986 ###############################################
B-C-A-D: 1033 #################################################
B-C-D-A: 1016 ################################################
B-D-A-C: 957 ##############################################
B-D-C-A: 1022 #################################################
C-A-B-D: 989 ###############################################
C-A-D-B: 962 ##############################################
C-B-A-D: 993 ###############################################
C-B-D-A: 1028 #################################################
C-D-A-B: 955 #############################################
C-D-B-A: 1004 ################################################
D-A-B-C: 1023 #################################################
D-A-C-B: 1051 ##################################################
D-B-A-C: 985 ###############################################
D-B-C-A: 1020 #################################################
D-C-A-B: 990 ###############################################
D-C-B-A: 1012 ################################################
COUNT= 24
There are 24 possible outputs and we did 24,000 runs, so each output should be close to the expected result, 1,000. It seems that the algorithm that produced this output is good; all counts are very similar to each other. Checking algorithms empirically was good enough for my case: I could verify that the published algorithm was quite bad (as we’ll see below) and it was useful to test that my implementations weren’t buggy.
The following code implements the idea we’ve described: it does a given number of shuffles, records how many times each output appeared, and produces a bar chart with the results.
const makeBar = (len, val, max) =>
"#".repeat(Math.round((len * val) / max)); /* 1 */
const testShuffle = (fn, n = 5, times = 4000) => { /* 2 */
const result = {}; /* 3 */
let max = 0; /* 4 */
for (let i = 1; i <= times; i++) { /* 5 */
const arr = Array(n) /* 6 */
.fill(0)
.map((v, i) => String.fromCharCode(65 + i));
const x = fn(arr).join("-"); /* 7 */
result[x] = x in result ? result[x] + 1 : 1; /* 8 */
max = Math.max(max, result[x]); /* 9 */
}
let count = 0; /* 10 */
for (const [key, val] of Object.entries(result).sort((a, b) =>
a < b ? -1 : 1
)) {
count++; /* 11 */
console.log(`${key}: ${String(val).padStart(5)} ${makeBar(50, val, max)}`);
}
console.log("COUNT=", count); /* 12 */
};
An auxiliary makeBar()
function will be used to produce a horizontal bar [1] with which we’ll produce our chart. The testShuffle()
function receives a shuffling function as a parameter, the size of the array to use for tests (defaulting to 5), and how many times to do the test [2]. An array will store the results [3]; a map would be an alternative. The max
variable [4] will track the highest count we find. We’ll do a loop [5] in which we’ll initialize an array with the first letters of the alphabet [6], we’ll shuffle it [7] and make a string (such as “B-C-A-D”) to use as a key when tracking counts [8]. After getting each new count [9] we’ll see if we have to update max
. After the loop is done we’ll count how many distinct permutations were produced [10]. We’ll sort all the produced sequences alphabetically, and loop over them updating the count [11] and printing out bars. Finally, we’ll log the count [12] which is an extra check.
The really bad (but popular?) shuffle
OK, now we have a way to check if any given shuffling array is any good or not. Let’s consider how to do the shuffle itself. The truly bad algorithm that has been making the rounds has the advantage of simplicity itself: a one-liner! What’s the idea? The key concept needs remembering how JavaScript’s own .sort()
method work. To sort an array you have to provide a function that will compare two values and return a negative value if the first is smaller than the second, a positive value if it is greater, and zero if values are equal. The algorithm you may find at many places randomly returns a negative or positive result (instead of actually comparing values) so it’s expected that the resulting array may be in some disorder. An implementation follows.
const poorShuffle = (arr) => arr.sort(() => Math.random() - 0.5);
Calculating a random number produces a result between 0 and 1; subtracting 0.5 makes a result between -0.5 and +0.5, so half of the times the comparison will go one way, and the other half of the times it will go the other way. If you were shuffling a two-element array, this would indeed work. But what happens with larger arrays? To test this, I tried doing testShuffle(poorShuffle, 3, 10_000)
but got very bad results. For example, doing 10,000 experiments with three elements, the results are lopsided.
A-B-C: 3777 ##################################################
A-C-B: 632 ########
B-A-C: 1258 #################
B-C-A: 643 #########
C-A-B: 605 ########
C-B-A: 3085 #########################################
COUNT= 6
Instead of getting six similar totals, there’s a very wide variation. The lowest count is less than one-sixth of the highest count, and that doesn’t seem really random. I tried larger arrays, and the results were similar; for instance, with 4 elements I got the following.
A-B-C-D: 1886 ##################################################
A-B-D-C: 671 ##################
A-C-B-D: 165 ####
A-C-D-B: 150 ####
A-D-B-C: 627 #################
A-D-C-B: 169 ####
B-A-C-D: 306 ########
B-A-D-C: 320 ########
B-C-A-D: 162 ####
B-C-D-A: 157 ####
B-D-A-C: 311 ########
B-D-C-A: 149 ####
C-A-B-D: 162 ####
C-A-D-B: 132 ###
C-B-A-D: 429 ###########
C-B-D-A: 452 ############
C-D-A-B: 156 ####
C-D-B-A: 474 #############
D-A-B-C: 653 #################
D-A-C-B: 161 ####
D-B-A-C: 338 #########
D-B-C-A: 152 ####
D-C-A-B: 141 ####
D-C-B-A: 1677 ############################################
COUNT= 24
Even if all 24 possible shuffles were produced, the results are even more lopsided. The longer the array, the worse the results; with five elements the ratio between the lowest and highest counts was around 60! For sure, this algorithm, despite its simplicity, cannot be used. This was unexpected, and meant that I had to look for a better option; let’s get to that!
Shuffling by sorting
The first algorithm I’ll describe is also based on the idea of sorting (like the popular and wrong function that we discarded!) but works on a different basis. The key concept is simple. To get a random sequence, assign to each array element a random number, sort the array by that number, and you’ll produce a totally random ordering. This can be implemented as a one-liner — though OK, a not too short one…
const sortingShuffle = (arr) =>
arr
.map((v) => ({ val: v, key: Math.random() })) /* 1 */
.sort((a, b) => a.key - b.key) /* 2 */
.map((o) => o.val); /* 3 */
Let’s see how this works. First [1] we take the array to be shuffled and produce a new array of objects with the original value at val
and a random number at key
. Then [2] we sort the array by comparing the key values. Finally [3] we undo the mapping by just extracting the val
values. We’ve got a randomly shuffled array, with practically no code! I did some test runs, and got good results like the following, in which I shuffled a three-element array 60 thousand times! (It took about a tenth of a second at my machine.) I also tested longer arrays — but I always got good results, so I had good reasons to believe that my implementation was correct.
A-B-C: 10003 #################################################
A-C-B: 9983 #################################################
B-A-C: 9915 #################################################
B-C-A: 10001 #################################################
C-A-B: 9976 #################################################
C-B-A: 10122 ##################################################
COUNT= 6
This method is quite efficient, and for smallish arrays is probably best. For larger arrays sorting may take a bit too long, and some basic Computer Science results say that this behavior will grow worse. (For example, if you double the array size, time will more than double.) I checked the literature and found a better algorithm, whose performance is proportional to the size of the array, as you’d desire. Curiously, the way the algorithm works is somehow related to Bogosort, also known as “the worst sorting algorithm ever”! Let’s first consider this spectacularly bad sort algorithm, and then turn it into a fine shuffling one.
Sorting by chance
If you study Computer Science, you’ll be taught several sorting methods — and each one tries to be better than the others, and perform at top speed. There exists, however, that is totally inefficient. To describe it, let’s imagine how you could shuffle a deck of cards. Bogosort does it very simply:
- throw the deck into the air, and let all cards fall on the floor.
- pick up the cards in any order, as randomly as you wish
- check if (by pure chance!) the cards are in order.
- If so, you’re done!
- But if not (most likely…) go back to step 1 and try again!
This is truly bad — and I hope you won’t be tempted to try it out! However, the notion of picking elements in random order does lead to an interesting way of doing shuffling, and that’s what we’ll describe.
Open Source Session Replay
OpenReplay is an open-source, session replay suite that lets you see what users do on your web app, helping you troubleshoot issues faster. OpenReplay is self-hosted for full control over your data.
Start enjoying your debugging experience - start using OpenReplay for free.
Fisher-Yates shuffling method
The Fisher-Yates method is a perfect (albeit a bit too complex to do by hand…) way of shuffling a deck of cards. How does it work? Basically, pick the whole deck. While there are cards left in it, randomly pick any card, put it away, and repeat. When the deck is exhausted, you’ll have put away the cards in a totally random way. As we are shuffling an array, we can do better, by not even requiring an extra array. To shuffle an array of N elements, we may simply do:
for I starting at N-1, going down to 1:
let J be a random number between 0 and I inclusive
exchange the elements at positions I and J
We need a function that given a number K will generate a number between 0 and K itself. We can manage that by using Math.random()
and Math.floor()
as follows.
const randomInt = (k) => Math.floor((k + 1) * Math.random());
Why does this work? First, remember that Math.random()
produces a number greater than or equal to zero, but less than 1. If you multiply this by (K+1) the result is greater than or equal to zero, but less than (K+1). If you take the integer part of it with Math.floor()
you’re left with an integer number greater than or equal to zero, and less than (K+1) — that is to say, less than or equal to K, as desired. With this function, we can complete our shuffling code.
const fisherYatesShuffle = (arr) => {
for (let i = arr.length - 1; i > 0; i--) {
const j = randomInt(i);
[arr[i], arr[j]] = [arr[j], arr[i]];
}
return arr;
};
The function works as described above, and produces uniform random shuffles. A sample run with our testing function from the previous article shows its good behavior.
A-B-C-D: 1028 ################################################
A-B-D-C: 1005 ###############################################
A-C-B-D: 946 ############################################
A-C-D-B: 1017 ################################################
A-D-B-C: 988 ##############################################
A-D-C-B: 1049 #################################################
B-A-C-D: 1013 ################################################
B-A-D-C: 952 #############################################
B-C-A-D: 1043 #################################################
B-C-D-A: 935 ############################################
B-D-A-C: 1066 ##################################################
B-D-C-A: 1041 #################################################
C-A-B-D: 940 ############################################
C-A-D-B: 1007 ###############################################
C-B-A-D: 982 ##############################################
C-B-D-A: 1029 ################################################
C-D-A-B: 994 ###############################################
C-D-B-A: 992 ###############################################
D-A-B-C: 972 ##############################################
D-A-C-B: 1004 ###############################################
D-B-A-C: 995 ###############################################
D-B-C-A: 971 ##############################################
D-C-A-B: 986 ##############################################
D-C-B-A: 1045 #################################################
COUNT= 24
We’re done! If you have to process larger arrays and the sorting-based shuffling algorithm takes too long, the Fisher-Yates algorithm will do better.
Summary
In this article, we’ve seen a special requirement I got, which implied sorting the output of a service call in a random way, and we discussed how to test any possible algorithm, and ended by applying our analysis to a very commonly mentioned shuffling algorithm — which proved to be quite bad! We also considered a pair of shuffling algorithms:
- The first one had the plus of being extra-simple and short but could behave not so well for longish arrays.
- The second algorithm was a tad more complex, but its performance was guaranteed to be good.
While there are more algorithms to shuffle data, these were enough for my requirements (I started by using the sorting shuffling algorithm, but then changed to Fisher-Yates to be safe in case an API result got very big). Server-side coding doesn’t usually make you get involved with this kind of problem, so the requirement that my users gave me was quite interesting.
And, it also provided a reminder to not blindly trust code you can pull out of the web — “you get what you pay for” also applies to code!