Facebook job puzzles: Korn Shell

by Jesse Farmer on Tuesday, April 15, 2008

Welcome to the first installment of 20bits' Facebook job puzzles solutions manual. My ultimate goal is to solve every interesting puzzle in the aforelinked list and make a public post with the solution code and an explanation. Why? Because I'm a mathematician by training and no good puzzle should go unsolved.

I intended to start with the Evil Gambling Monster puzzle because I thought it would be the best learning experience given my lack of formal CS training. I did learn some new algorithms in solving it (e.g., the A* search algorithm), but that's when the Korn Shell puzzle caught my eye. My inner mathematician couldn't resist and I believe I have solved the puzzle mathematically. The solution involves my favorite area of mathematics (group theory), so I'm going to attempt to explain it in a way that is understandable to a layperson.

The Rules of the Game

You are sitting at a computer terminal whose 26 alphabetic keys have been randomly rearranged (permuted) and you enter your name, say, "Mike Korn." Since the keys are scrambled what appears on the screen is also scrambled. The game consists of you typing the characters you see on the screen until your name appears. The length of the game is the number of times you must type a the string on the screen before your name appears, at which point the game terminates.

A game therefore consists of two pieces of information, viz., an input string and a permutation of the 26 alphabetic keys. Everything else is completely determined by these two pieces of information. The puzzle is this: if I give you a name can you give me the length of the longest possible game which uses that name as its first input?

Questions to Ask

The first step in reasoning mathematically is to isolate your assumptions. The biggest assumption above is that every possible game is guaranteed to terminate in a finite number of steps. This isn't particularly obvious on the outset. This leads us to the following questions.

  1. Is every possible game guaranteed to terminate in a finite number of steps?
  2. How does the length of game vary with respect to the two inputs, i.e., the name and the permutation?
  3. If every game does terminate in a finite number of steps, what is the longest game?

If we can answer these questions well enough then we are done. I'm claiming that each of these questions has an explicit answer and that no algorithm is necessary to determine any of them. So, let's get answering.

Helpful Notation

I will introduce more notation as it becomes necessary but I'd like to introduce a few preliminaries. Let's say the A and B keys have been swapped on the keyboards and no other keys have been touched. We can write this as A → B → A, since, if you type 'A' the screen prints 'B,' and then if you type 'B' the screen again prints 'A.' Similarly, if we took the letters Q,W,E and R and moved each to the right (on a QWERTY keyboard), with R going to Q's place, we could write Q → W → E → R → Q.

Additionally, if I want to refer to a specific permutation I will use the symbol σ or τ to represent it. Those are the lower-case Greek letters sigma and tau, respectively. This is only in the case where I am talking about a permutation in the abstract — if I want to talk specifically about what the permutation has done to the keyboard keys then I will use the above notation to describe its action.

The single-letter case

Is every possible game guaranteed to terminate in a finite number of steps? When in doubt, start simple. What happens if we have a single letter as an input, say, 'A?' We enter A and another letter, say, V, appears. We then enter V and yet another letter appears. Now, at any point, the next letter that appears could be A, in which case we're done.

But can this go on forever? Well, let's say we've seen every letter from A to Z exactly once and the last letter on the screen was an X. What happens when we hit the X key? The original letter, A in this case, must appear on the screen. If another letter appears, say B, then the key we pressed earlier which produced B must have been X. But this means that we have seen X twice, an impossibility!

In fact, we just solved the puzzle for the single-letter input case. If we are given a single letter as an input the permutation (i.e., the rearrangement of the 26 alphabetic keys) we want is represented by A → B → C → ... → Y → Z → A. For any single-letter input this produces a game of length 26. As we showed above, no game with a single-letter input can last longer than 26 turns.

On Cycles and Sufjan Stevens (or, the multi-character case)

What happens when we are given a two-character input string? When I explained this to my girlfriend I use a music analogy. Let's say we have two musicians, one playing in 4/4 time and the other playing in 5/4 time (maybe he's Dave Brubeck or Sufjan Stevens). If both musicians start playing on the same beat how many beats does it take for their measures to come back into alignment? The answer is 20, the least common multiple of 4 and 5.

Remember that if A and B are swapped then we write A → B → A. This is called a "cycle" because the path a letter takes through it always eventually comes back to the beginning. A more common notation is (A B) instead of A → B → A, with the understanding that the right side wraps around to the left. Similarly we write (Q W E R) instead of Q → W → E → R → Q.

Given an arbitrary permutation σ of the 26 alphabetic keys every letter belongs to exactly one cycle. We can find all the other letters in this cycle by following the bouncing ball. For example, if we want to find the cycle of C then we type in C. 'R' appears on screen, so we type in 'R.' We then see 'H' followed by 'L' followed by 'V' followed by 'C' again. The whole cycle is therefore (C R H L V). If, in addition to having C, R, H, L, and V permuted in that fashion, the A and B keys have been swapped, we write (A B)(C R H L V). By convention if we write down a permutation using this notation and leave off letters that means the corresponding keys haven't been moved.

Let's take the permutation (A L)(E M R), i.e., the A and L keys have been swapped and the E key has been moved to M's position, M to R's, and R to E's. What happens when we enter the name "Mel Farb?" Remember that if a letter isn't present in any cycles it means it isn't affected by the permutation (mathematicians would say that the permutation fixes that letter).

Mel Farb
Rma Fleb
Erl Famb
Mea Flrb
Rml Faeb
Era Flmb
Mel Farb

Going back to the musical analogy, imagine we have a whole orchestra. Every letter is a musician. Musicians playing the same part all belong to the same cycle. The length of a cycle is the time signature in which each part is playing. The length of the game is the number of beats it takes for the measures in the various parts (i.e., cycles) to realign, which happens to be the least common multiple of the lengths of the cycles.

We have therefore answered our first question and much of the second. Yes, every game is guaranteed to terminate in a finite number of steps. If every part is being played by one musician, i.e., every cycle contains at least one letter in our input string, then the length of the game is determined wholly by the lengths of the cycles of the permutation.

The Computation

So every permutation can be represented uniquelyUp to the ordering of the cycles using the above cycle notation. This means that if we construct a cycle representation then we also get a permutation. To construct really long games, then, we need to maximize the size of the least common multiple of the cycle lengths under the constraint that the sum of their lengths is no greater than 26.

For now let's work in the case where our input string contains enough distinct letters, i.e., every cycle of our permutation contains at least one letter in the input string. Let's look at some permutations. Take (A B)(C D E)(F G H I J). If the input string is something like "ACF" this permutation leads to a game of length 30, since the least common multiple of 2, 3, and 5 is 5*3*2 = 30. But we could do better by adding another cycle of length 7, giving us a game length of 210. Every number between 7 and 11 is a multiple of 2, 3, or 5, so no cycles of any of these lengths will increase the length of our game, but we can't add a cycle of length 11 because we only have 26 letters and 2+3+5+7+11 = 28.

If every cycle in a permutation of the 26 letters contains at least one letter of the input string then we know that the length of the game is wholly determined by the cycle structure of the permutation. The length is the least common multiple of the size of the cycles. We therefore want to find the cycle structure which maximizes the length of the game. We can do that as follows.

Mathematically, writing a number n as sum of positive integers is called a partition of n. So we're interested in partitions of 26. We want to find the maximum least common multiple of the sizes of the parts of all possible partitions of 26. Whew. For example, we can write 26 = 25+1. The least common multiple of 25 and 1 is 25. We can also write 26 = 1+2+3+5+7+8. The least common multiple of these parts is 210. It turns out there are 2436 partitions of 26, which is more than we want to check by hand.

I have written some Python code below which calculates this number and it happens to be 1260, corresponding to cycles of length 4, 5, 7, and 9. This means that the longest possible game, period, is 1260. No matter what the user inputs we will never be able to beat this number. But can we always match it?

From one permutation to many

So, we know the "ideal" permutation has four cycles. As I've said before, if the input string contains fewer than four distinct letters then one of the cycles will be empty and won't contribute to the length of the game. This means that we have four cases for the number of distinct letters in the input string: one, two, three, and four or more. Let's go case-by-case, focusing on the last case first.

  1. Four or more

    We know the ideal cycle structure in this situation (cycles of length 4,5,7, and 9), but the input string can still vary. If the input string contains four or more distinct characters can we always find a permutation which makes the game last 1260 rounds? Yes, because we are free to choose what letters go in what cycle.

    If our input string is "ABCDE" then we might put A in the first cycle, B in the second, C in the third, D in the fourth, and E in any of the cycles. The remaining spots in the cycles can then be filled with whatever letters we want. Because the length of the game is determined only by the cycle structure we are guaranteed to get a game that lasts 1260 rounds.

  2. Three characters

    If our input string contains three characters we're back to square one. We might think to try using the three longest cycles from our ideal 4-5-7-9 permutation above, but that doesn't quite work. This permutation makes the game last 315 rounds, but what is to say that there isn't a permutation with exactly three cycles that lasts longer than 315 rounds?

    As it turns out there is a permutation which yields a game that lasts 630 rounds. This permutation has cycles of length 7, 9, and 10. Again, there is Python code below to calculate this number exactly.

  3. The last two cases

    This case is essentially the same as the three-character case, except that we wish to find a permutation with two cycles rather than three. Using the code below we find a permutation which leads to a game of length 165 and has cycles of length 15 and 11.

    As for the one-letter case, well, we've already done it! If you recall a single letter cannot go through a game longer than 26 rounds before appearing again. What's more, the permutation which produces this game can be named explicitly: (A B C ... X Y Z).

  4. The Code

    As promised, here is the code which produces the above numbers. landau implements Landau's function, which produces the longest possible permutation on n letters — that means landau(26) = 1260. part2 handles the two-character case and part3 the three-character case. With slight modifications you can make them print out the cycle structure rather than the length of the game.

    At the request of Facebook portions of the code have been redacted.

    #!/usr/bin/python
    import sys
    
    def gcd(a,b):
    	if b == 0: return a
    	return gcd(b, a % b)
    
    def lcm(a,b):
    	return (a*b)/gcd(a,b)
    
    def lcm2(a,b,c):
    	return lcm(lcm(a,b),c)
    
    def landau(n):
    	ans = [1]*(n+1)
    	
    	for i in range(1,n+1):
    		for k in range (1,i+1):
    			test = lcm(k, ans[i-k])
    			if (test > ans[i]):
    				ans[i] = test
    			
    	return ans[n]
    
    print [landau(int(n)) for n in sys.argv[1:]]

    Of course the Facebook puzzle (remember that?) stipulates that, given an input string on the command line, we are to output the longest possible time it could take to produce that input string again. We've solved the problem mathematically which means the "solution" program is stupidly simple. Here it is in Ruby, just because I can:

    #!/usr/bin/ruby
    case ARGV.inject{|sum, i| sum + i}.downcase.split('').uniq.size
    when 1
      puts 26
    when 2
      puts 165
    when 3
      puts 630
    else
      puts 1260
    end

    The Math

    Some of you might be interested in the math, so here it is in a hyper-condensed form. We start with what is called a group. The set of all permutations on n letters forms a group called the symmetric group, denoted Sn. We are interested in S26, i.e., the set of all permutations of 26 letters. For two elements σ, τ in Sn we write στ to mean their composition under the group operation. We write σ2 to mean σσ, i.e., the application of sigma twice.

    Permutations can be written using cycle notation. Let's say σ(A) = B and σ(B) = A. That is, σ swaps the letters A and B. This corresponds to the cycle (A B). Similarly, we might have σ(Q) = W, σ(W) = E, σ(E) = R, and σ(R) = Q, which corresponds to the cycle (Q W E R). As in the explanation of the puzzle we can do the opposite: by writing down cycles we are actually choosing a permutation which contains those cycles.

    For every permutation σ in Sn there is a smallest positive integer n such that σn is the identity element, i.e., the permutation which fixes every letter. This n is called the order of σ, denoted |σ|. This is equivalent to the statement that every game as described initially will stop in a finite number of steps. The order of σ is the least common multiple of the lengths of its cycles when written as a product of disjoint cycles.

    Since every element has a well-defined order the function g(n) = max{|σ| : σ in Sn} is also well-defined. In fact, this function is called Landau's function. I didn't learn of this function until I showed my solution to some math friends and they pointed me to the definition. It turns out that g(26) = 1260. You can see the values of g(n) for 0 ≤ n ≤ 47 at the Online Encyclopedia of Integer Sequences (yes, such a thing exists). Or you can use my Python program to calculate it — it runs quite quickly!

    Das Ende

    The statement of the original puzzle is as follows:

    Given a particular name (e.g. "Mike Korn"), what is the maximum number of times the user might have to try typing in his name (or whatever has appeared on the screen) until his real name appears, if the manner in which the keys have been mixed up is unknown?

    From the above arguments we have learned that the answer depends only on the number of distinct characters in the input string. The Ruby code above outputs that number given the input. Furthermore, it is not difficult to generate the permutation in addition by filling in the empty spots in the cycle structure of our ideal permutation in each case. So, in that regard, we have actually solved a harder puzzle than the Facebook puzzle.

    I hope you all enjoyed reading this as much as I enjoyed writing it. My daily life doesn't afford me many opportunities to return to my old math haunts. Cheers, and happy coding!

    PS

    Here is some C code to print off the silly Facebook email address.

    #include 
    
    int main(char **argv, int argc) {
        printf("%d\n", 0xFACEB00C>>2);
        return 0;
    }