Interview Questions: Shuffling an Array

by Jesse Farmer on Tuesday, April 15, 2008

This is part of my interview question series. It's about shuffling arrays.

The Question

You have an array A of size N. Write a routine that shuffles the array in-place. The only restrictions are that all possible permutations of A must be possible and equally likely.

This interview question serves as a test for basic algorithm construction. There's a canonical solution that's not too difficult to arrive at if you've never seen it before, so it's a good combination of "what do you know?" and "what can you do?"

Workin' it out

I'm going to create my solution in Ruby because that's the language the company that asked me this question used.

The first solution most people arrive at is subtly wrong. Jeff Atwood made the mistake in his blog post. The algorithm, in words, goes like this: iterate through each item in the array, pick another element at random, and swap the two.

In Ruby the above algorithm would look like this.

class Array
  def shuffle_naive!
    n = size
    until n == 0
      k = rand(size) #This is the line which proves our undoing
      n = n -1
      self[n], self[k] = self[k], self[n]
    end
  end
end

This solution seems correct if not optimal, but there's a subtle problem: not all outcomes are equally likely.

The root cause of this is because this algorithm is drawing from a sample space of size NN, while the sample space of all permutations on an N-element array is only N!.

That is, for the naive shuffle, for each of the N steps in the iteration we make one of N decisions for a total of NN possible outcomes.

But NN > N! for all N > 1 and, more importantly, N! is not a divisor of NN. This means we're going to prefer at least one of the permutations more than the others, so the algorithm doesn't select among the possible permutations uniformly.

KFC, KFY

The "best" solution is the Knuth-Fischer-Yates shuffle. Here it is in Ruby

class Array
  def shuffle!
    n = size
    until n == 0
      k = rand(n) #You can see I'm doing rand(n) rather than rand(size)
      n = n - 1
      self[n], self[k] = self[k], self[n]
    end
    self
  end
end

This works because it's an iterative version of an essentially recursive algorithm. If we know how to shuffle an array of size N-1 then shuffling an array of size N is easy — first shuffle the sub-array consisting of the first N-1 elements and then randomly swap in the last element to any of the N slots.

There's a proper inductive proof in there if you're so inclined, but it's not particularly illuminating.

Good Questions, Bad Questions

My next article is going to be more about the interview process rather than specific questions. One key thing to understand in an interview is what information the interviewer is looking for in asking their question. Hint: it's not always the answer.

Among other things they want to suss out the limits of your knowledge, how you solve problems, how quickly you resort to help, and a whole assortment of other, behavioral things, that they get because you're right there (hopefully) engaging in a dialogue.