<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>20bits &#187; interview</title>
	<atom:link href="http://20bits.com/tag/interview/feed/" rel="self" type="application/rss+xml" />
	<link>http://20bits.com</link>
	<description>Driven by Data</description>
	<lastBuildDate>Wed, 07 Oct 2009 06:07:48 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Interview Questions: Database Indexes</title>
		<link>http://20bits.com/articles/interview-questions-database-indexes/</link>
		<comments>http://20bits.com/articles/interview-questions-database-indexes/#comments</comments>
		<pubDate>Tue, 13 May 2008 18:56:53 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://20bits.com/?p=134</guid>
		<description><![CDATA[
Continuing my series on interview questions, I&#8217;m going to spend some time covering ops and sysadmin questions.  We&#8217;ll start by writing up an introduction to database indexes and their structure.


The Question

Most consumer-facing web startups these days use one of the major open source databases, either MySQL or PostgreSQL, to some degree.  If you [...]]]></description>
			<content:encoded><![CDATA[<p>
Continuing my series on <a href="http://20bits.com/tag/interview">interview questions</a>, I&#8217;m going to spend some time covering ops and sysadmin questions.  We&#8217;ll start by writing up an introduction to database indexes and their structure.
</p>

<h3>The Question</h3>
<p>
Most consumer-facing web startups these days use one of the major open source databases, either MySQL or PostgreSQL, to some degree.  If you want to prove your worth it&#8217;s a good idea to get down to the nitty gritty and gain some understanding about these databases&#8217; internals.
</p>

<p>
So, the question: &#8220;Explain to me what databases indexes are and how they work.&#8221;
</p>

<h3>The Answer</h3>
<p>
In a nutshell a database index is an auxiliary data structure which allows for faster retrieval of data stored in the database.  They are keyed off of a specific column so that queries like &#8220;Give me all people with a last name of &#8216;Smith&#8217;&#8221; are fast.
</p>

<h3>The Theory</h3>
<p>
Database tables, at least conceptually, look something like this: <pre>id	age	last_name	hometown
--	--	--		--
1	10	Johnson		San Francisco, CA
2	27	Smith		San Joe, CA
3	15	Rose		Palo Alto, CA
4	64	Farmer		Mill Valley, CA
5	55	Pauling		San Francisco, CA
6	17	Smith		Oakland, CA
...	...	...		...
100	49	Meyer		Berkeley, CA
101	30	Wayne		Monterey, CA
102	18	Schwartz	San Francisco, CA
104	6	Johnson		San Francisco, CA
...	...	...		...
10000	41	Fetterman	Mountain View, CA
10001	25	Breyer		Redwood City, CA</pre>
</p>

<p>
That is, a table is a collection of <a href="http://en.wikipedia.org/wiki/Tuple">tuples</a><sup>1</sup>.  If we have a file like this sitting on disk how do we get all records that have a last name of &#8216;Smith?&#8217;
</p>

<p>
The code would wind up looking something like this: <div class="dean_ch" style="white-space: wrap;">results = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
<span class="kw1">for</span> row <span class="kw1">in</span> rows:<br />
&nbsp; &nbsp; <span class="kw1">if</span> row<span class="br0">&#91;</span><span class="nu0">2</span><span class="br0">&#93;</span> == <span class="st0">&#8216;Smith&#8217;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; results.<span class="me1">append</span><span class="br0">&#91;</span>row<span class="br0">&#93;</span></div>
</p>

<p>
Finding the appropriate records requires checking the conditions (here, having a last name of &#8216;Smith&#8217;) for each row.  This is linear in the number of rows which, for many databases, could be millions or billions of rows.  Bad news.
</p>

<p>
How can we make it faster?
</p>

<h3>Database Indexes</h3>
<p>
Any type of data structure that allows for (potentially) faster access can be considered an index.  Let&#8217;s look at some.
</p>

<h4>Hash Indexes</h4>
<p>
Take the same example from above, finding all people with a last name of &#8216;Smith.&#8217;  One solution would be to create a <a href="http://en.wikipedia.org/wiki/Hash_function#Hash_tables">hash table</a>.  The keys of the hash would be based off of the <tt>last_name</tt> field and the values would be pointers to the database row.
</p>

<p>
This type of index is called, unsurprisingly, a &#8220;hash index.&#8221;  Most databases support them but they&#8217;re generally not the default type.  Why?
</p>

<p>
Well, consider a query like this: &#8220;Find all people who are younger than 45.&#8221;  Hashes can deal with equality but not inequality.  That is, given the hashes of two fields, there&#8217;s just no way for me to tell which is greater than the other, only whether they&#8217;re equal or not.
</p>

<h4>B-tree Indexes</h4>

<p>
The data structure most commonly used for database indexes are <a href="http://en.wikipedia.org/wiki/B-tree">B-trees</a>, a specific kind of self-balancing tree.  A picture&#8217;s worth a thousand words, so here&#8217;s an example.
<img src="http://20bits.com/wp-content/uploads/2008/05/b-tree.png" alt="B-tree" title="b-tree" width="494" height="206" class="math size-full wp-image-135" />
</p>

<p>
The main benefit of a B-tree is that it allows logarithmic selections, insertions, and deletions in the worst case scenario.  And unlike hash indexes it stores the data in an ordered way, allowing for faster row retrieval when the selection conditions include things like inequalities or prefixes.
</p>

<p>
For example, using the tree above, to get the records for all people younger than 13 requires looking at only the left branch of the tree root.
</p>

<h4>Other Indexes</h4>
<p>
Hash indexes and B-tree indexes are the most common types of database indexes, but there are others, too.  MySQL supports <a href="http://en.wikipedia.org/wiki/R-tree">R-tree</a> indexes, which are used to query spatial data, e.g., &#8220;Show me all cities within ten miles of San Francisco, CA.&#8221;
</p>

<p>
There are also <a href="http://en.wikipedia.org/wiki/Bitmap_index">bitmap indexes</a>, which allow for almost instantaneous read operations but are expensive to change and take up a lot of space.  They are best for columns which have only a few possible values.
</p>

<h3>Subtleties</h3>
<h4>Performance</h4>
<p>
Indexes don&#8217;t come for free.  What you gain for in retrieval speed you lose in insertion and deletion speed because every time you alter a table the indexes must be updated accordingly.  If your table is updating frequently it&#8217;s possible that having indexes will cause overall performance of your database to suffer.
</p>

<p>
There is also a space penalty, as the indexes take up space in memory or on disk.  A single index is smaller than the table because it doesn&#8217;t contain all the data, only pointers to the data, but in general the larger the table the larger the index<sup>2</sup>.
</p>

<h4>Design</h4>
<p>
Nodes in a B-tree contain a value and a number of pointers to children nodes.  For database indexes the &#8220;value&#8221; is really a pair of values: the indexed field and a pointer to a database row. That is, rather than storing the row data right in the index, you store a pointer to the row on disk.
</p>

<p>
For example, if we have an index on an <tt>age</tt> column, the value in the B-tree might be something like (34, 0&#215;875900).  34 is the age and 0&#215;875900 is a reference to the location of the data, rather than the data itself.
</p>

<p>
This often allows indexes to be stored in memory even for tables that are so large they can only be stored on disk.
</p>

<p>
Furthermore, B-tree indexes are typically designed so that each node takes up <a href="http://en.wikipedia.org/wiki/Block_(data_storage)">one disk block</a>.  This allows each node to be read in with a single disk operation.
</p>

<p>
Also, for the pedants among us, many databases use <a href="http://en.wikipedia.org/wiki/B%2B_tree">B+ trees</a> rather than classic B-trees for generic database indexes.  InnoDB&#8217;s <tt>BTREE</tt> index type is closer to a B+ tree than a B-tree, for example.
</p>

<h3>Summary</h3>
<p>
Database indexes are auxiliary data structures that allow for quicker retrieval of data.  The most common type of index is a B-tree index because it has very good general performance characteristics and allows a wide range of comparisons, including both equality and inequalities.
</p>

<p>
The penalty for having a database index is the cost required to update the index, which must happen any time the table is altered.  There is also a certain about of space overhead, although indexes will be smaller than the table they index.
</p>

<p>
For specific data types different indexes might be better suited than a B-tree.  R-trees, for example, allow for quicker retrieval of spatial data.  For fields with only a few possible values bitmap indexes might be appropriate.
</p>

<h3>Good Question, Bad Question</h3>
<p>
I like this question because it shows whether the interviewee is curious enough to dive into these details.  For certain higher-level engineering positions knowing this should be second-nature, but even for a generic web development position knowing how your database works will only help you improve the performance of your web application.
</p>

<p>
Also, it&#8217;s just arcane enough that you can go through the motions without knowing it, but not so arcane that it&#8217;s inaccessible to someone without an advanced education.  Any decent programmer should be able to understand it &mdash; the exceptional ones will go out of their way to learn it.
</p><ol class="footnotes"><li id="footnote_0_134" class="footnote">For bonus points, the &#8220;relational&#8221; in &#8220;relational database&#8221; comes from this fact, not from the idea that there are &#8220;relations&#8221; between tables.</li><li id="footnote_1_134" class="footnote">Technically the size of an index is going to be proportional to the cardinality of the column being indexed.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://20bits.com/articles/interview-questions-database-indexes/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Interview Questions: Counting Bits</title>
		<link>http://20bits.com/articles/interview-questions-counting-bits/</link>
		<comments>http://20bits.com/articles/interview-questions-counting-bits/#comments</comments>
		<pubDate>Wed, 30 Apr 2008 06:00:59 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[bit count]]></category>
		<category><![CDATA[c]]></category>
		<category><![CDATA[interview]]></category>

		<guid isPermaLink="false">http://20bits.com/?p=108</guid>
		<description><![CDATA[
Continuing my series of interview questions, today I bring you the classic bit-counting problem.



The setup usually goes something like this.  We&#8217;re receiving gigabytes of data per second.  Each chunk of data comes with a header that contains an unsigned 32-bit integer.  Let&#8217;s call that integer the routing number.  We choose the [...]]]></description>
			<content:encoded><![CDATA[<p>
Continuing my series of <a href="/tag/interview">interview questions</a>, today I bring you the classic bit-counting problem.
</p>

<p>
The setup usually goes something like this.  We&#8217;re receiving gigabytes of data per second.  Each chunk of data comes with a header that contains an unsigned 32-bit integer.  Let&#8217;s call that integer the routing number.  We choose the routing destination based on the number of on bits in the binary representation of the routing number.
</p>

<p>
Write a routine that returns the number of on bits in the binary representation of an unsigned 32-bit integer in C.
</p>

<h3>The Naive Solution</h3>
<p>
As usual there&#8217;s a naive solution.  In this case you could loop through each bit at a time, counting the number of ones. <div class="dean_ch" style="white-space: wrap;"><span class="kw4">int</span> bitcount<span class="br0">&#40;</span><span class="kw4">unsigned</span> <span class="kw4">int</span> n<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">int</span> count = <span class="nu0">0</span>; &nbsp; &nbsp;<br />
&nbsp; &nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>n<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; count += n &amp; 0&#215;1u;<br />
&nbsp; &nbsp; &nbsp; &nbsp; n &gt;&gt;= <span class="nu0">1</span>;<br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> count;<br />
<span class="br0">&#125;</span></div>
</p>

<p>
<tt>>></tt> is the right bit-shift operator.  It drops the right-most bit from the binary representation of an integer.  So, <tt>0x1001 >> 1</tt> is equal to <tt>0x0100</tt>.
</p>

<p>
The above has a few issues.  First, it takes O(n) time, where n is the length of the binary representation of the integer.  Can we do better? Second, it doesn&#8217;t take into account the fact that n is a 32-bit integer<sup>1</sup>.
</p>

<h3>Pre-computation</h3>
<p>
Since speed was a requirement something that takes linear time is probably a bad idea.  The key idea is to realize that a deterministic function, like <tt>bitcount</tt>, is no different than a hash where the keys are the inputs to the function and the values are the output of the function.
</p>

<p>
This is principle behind memoization, for example, but here we&#8217;re sitting pretty.  Since both the input and output are unsigned integers we can create a regular array, call it <tt>bit_table</tt>, where <tt>bit_table[i]</tt> is the number of on bits in the binary representation of <tt>i</tt>.
</p>

<p>
Furthermore since we have the constraint that the integer is 32-bits we can, in theory, pre-compute the entirety of <tt>bit_table</tt> and include it in a header.  It&#8217;d work like this: <div class="dean_ch" style="white-space: wrap;"><span class="co1">// Pre-compute this elsewhere and put it here.</span><br />
<span class="kw4">static</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> bit_table32<span class="br0">&#91;</span>0&#215;1u &lt;&lt; <span class="nu0">32</span><span class="br0">&#93;</span>;<br />
<br />
<span class="kw4">int</span> bitcount_32<span class="br0">&#40;</span><span class="kw4">unsigned</span> <span class="kw4">int</span> n<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> bit_table32<span class="br0">&#91;</span>n &amp; 0xFFFFFFFFu<span class="br0">&#93;</span>;<br />
<span class="br0">&#125;</span></div>
</p>

<h3>Size Constraints</h3>
<p>
<tt>bit_table32</tt> is going to contain 4,294,967,296 integers.  Depending on the size of an integer on your platform this will probably take up several gigabytes of memory.  If we want a constant-time algorithm that takes up significantly less memory we can create a 16-bit table and use bit arithmetic.
<div class="dean_ch" style="white-space: wrap;"><span class="co1">// Pre-compute this elsewhere and put it here.</span><br />
<span class="kw4">static</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> bit_table16<span class="br0">&#91;</span>0&#215;1u &lt;&lt; <span class="nu0">16</span><span class="br0">&#93;</span>;<br />
<br />
<span class="co1">// This only works for 32-bit integers but takes constant time.</span><br />
<span class="kw4">int</span> bitcount_32<span class="br0">&#40;</span><span class="kw4">unsigned</span> <span class="kw4">int</span> n<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> bit_table16<span class="br0">&#91;</span>n &amp; 0xFFFFu<span class="br0">&#93;</span> + bit_table16<span class="br0">&#91;</span><span class="br0">&#40;</span>n &gt;&gt; <span class="nu0">16</span><span class="br0">&#41;</span> &amp; 0xFFFFu<span class="br0">&#93;</span>;<br />
<span class="br0">&#125;</span></div>
</p>

<h3>The Unrestricted Case</h3>
<p>
If we don&#8217;t know how many bits the integer will contain (say we moved from a 32-bit to a 64-bit platform) then we can iterate over the binary representation 16 bits at a time, using the pre-computed table at each step<sup>2</sup>.
<div class="dean_ch" style="white-space: wrap;"><span class="co1">// Pre-compute this elsewhere and put it here.</span><br />
<span class="kw4">static</span> <span class="kw4">unsigned</span> <span class="kw4">int</span> bit_table16<span class="br0">&#91;</span>0&#215;1u &lt;&lt; <span class="nu0">16</span><span class="br0">&#93;</span>;<br />
<br />
<span class="co1">// This works for any sized integer but no longer takes constant time.</span><br />
<span class="kw4">int</span> bitcount<span class="br0">&#40;</span><span class="kw4">unsigned</span> <span class="kw4">int</span> n<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; <span class="kw4">int</span> count = <span class="nu0">0</span>;<br />
&nbsp; &nbsp; <span class="kw1">while</span> <span class="br0">&#40;</span>n<span class="br0">&#41;</span> <span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; count += bit_table16<span class="br0">&#40;</span>n &amp; 0xFFFFu<span class="br0">&#41;</span>;<br />
&nbsp; &nbsp; &nbsp; &nbsp; n &gt;&gt;= <span class="nu0">16</span>;<br />
&nbsp; &nbsp; <span class="br0">&#125;</span><br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; <span class="kw1">return</span> count;<br />
<span class="br0">&#125;</span></div>
</p>

<h3>Good or Bad Question?</h3>
<p>
This question suffers from the same problems that the <a href="http://20bits.com/2008/04/17/interview-questions-loops-in-linked-lists/">reversing linked lists</a> in that you probably either know the solution or you don&#8217;t.
</p>

<p>
That said, the solution here &mdash; pre-computing a list of values to CPU time &mdash; is much more common than the tortoise and hare solution in the previous question, so the likelihood of it dawning on you during the interview is that much greater.  Plus I&#8217;ve been asked this question so many times that it&#8217;s one of those must-know exercises, in my opinion, even if the question itself could be better.
</p><ol class="footnotes"><li id="footnote_0_108" class="footnote">Let&#8217;s ignore the subtleties of integer types in C for now, ok?</li><li id="footnote_1_108" class="footnote">For the hard-core bit-counters out there, the C specification requires that integers contain <em>at least</em> 16 bits.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://20bits.com/articles/interview-questions-counting-bits/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Interview Questions: Loops in Linked Lists</title>
		<link>http://20bits.com/articles/interview-questions-loops-in-linked-lists/</link>
		<comments>http://20bits.com/articles/interview-questions-loops-in-linked-lists/#comments</comments>
		<pubDate>Thu, 17 Apr 2008 06:00:15 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[cycle]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[linked list]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://20bits.com/?p=99</guid>
		<description><![CDATA[
This is part of my series on interview questions, so welcome aboard!



This installment deals with a common question about linked lists &#8212; how do we detect when one has a loop?

Linked Lists


Linked lists are one of the most simple data structures and most aspiring programmers learn them early on.  But for completeness&#8217; sake let&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>
This is part of my series on <a href="/tag/interview">interview questions</a>, so welcome aboard!
</p>

<p>
This installment deals with a common question about <a href="http://en.wikipedia.org/wiki/Linked_list">linked lists</a> &mdash; how do we detect when one has a loop?
</p>
<h3>Linked Lists</h3>

<p>
Linked lists are one of the most simple data structures and most aspiring programmers learn them early on.  But for completeness&#8217; sake let&#8217;s cover that ground.
</p>

<p>
A linked list is a sequence of nodes.  Each node contains a piece of data and a reference to the next node in the list.  Graphically it looks like this 
<img src="http://20bits.com/wp-content/uploads/2008/04/linked-list.png" alt="" title="linked-list" width="501" height="99" class="math" />
</p>

<h3>Loopy Linked Lists</h3>
<p>
It&#8217;s possible, though, that a node in a linked list might point to a previous element in the list.  This is bad for many reasons, not the least of which is that any loop which iterates over all the nodes in the list by accessing the next node will never terminate.
</p>
<p>
So, it becomes important to detect when linked lists have loops.  Here&#8217;s what one such errant linked list looks like.
</p>
<p>
<img src="http://20bits.com/wp-content/uploads/2008/04/loopy-linked-list.png" alt="" title="loopy-linked-list" width="501" height="232" class="math" />
</p>

<h3>The Easy Solution</h3>
<p>
The easy solution is to keep track of every node seen so far and check if the current node is in that list.  Here&#8217;s a very simple linked list implementation in Ruby.
<div class="dean_ch" style="white-space: wrap;"><span class="kw1">class</span> Node<br />
&nbsp; attr_accessor <span class="re3">:data</span>, <span class="re3">:next</span><br />
&nbsp; <br />
&nbsp; <span class="kw1">def</span> initialize<span class="br0">&#40;</span>data = <span class="kw2">nil</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="re1">@data</span> = data<br />
&nbsp; &nbsp; <span class="re1">@next</span> = <span class="kw2">nil</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</p>

<p>Here is the simple solution for detecting loops using the above implementation <div class="dean_ch" style="white-space: wrap;"><span class="kw1">def</span> has_loop?<span class="br0">&#40;</span>node<span class="br0">&#41;</span><br />
&nbsp; seen = <span class="br0">&#91;</span><span class="br0">&#93;</span><br />
&nbsp; <span class="kw1">until</span> node.<span class="kw1">next</span>.<span class="kw2">nil</span>? <span class="kw1">do</span><br />
&nbsp; &nbsp; <span class="kw2">return</span> <span class="kw2">true</span> <span class="kw1">if</span> seen.<span class="kw1">include</span>? node<br />
&nbsp; &nbsp; seen &lt;&lt; node<br />
&nbsp; &nbsp; node = node.<span class="kw1">next</span><br />
&nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw2">false</span><br />
<span class="kw1">end</span></div>
</p>

<p>
This solution is workable but sub-optimal (surprise!).  This has O(n<sup>2</sup>) complexity in CPU and O(n) complexity in memory, but a solution with O(n) complexity in CPU and O(1) complexity in memory is possible.  In fact, this question is usually posed to preclude the above solution.
</p>

<h3>The Tortoise and the Hare</h3>
<p>
The better solution involves a bit of mathematical thinking.  If there is a loop then that means any iterator, no matter how many steps it takes per iteration, must hit the offending node.
</p>

<p>
So, if we have two iterators, one of which has a length that is a multiple of the other, they&#8217;ll eventually land on the same node.  The usual solution is to have one iterator advance one at a time (&#8221;the tortoise&#8221;) and a second iterator advance two at a time (&#8221;the hare&#8221;).
</p>

<p>
That algorithm looks like this <div class="dean_ch" style="white-space: wrap;"><span class="kw1">def</span> has_loop?<span class="br0">&#40;</span>node<span class="br0">&#41;</span><br />
&nbsp; slow = node<br />
&nbsp; fast = node<br />
&nbsp; <span class="kw1">until</span> slow.<span class="kw1">next</span>.<span class="kw2">nil</span>? <span class="kw1">or</span> fast.<span class="kw1">next</span>.<span class="kw2">nil</span>? <span class="kw1">do</span><br />
&nbsp; &nbsp; slow = slow.<span class="kw1">next</span><br />
&nbsp; &nbsp; fast = fast.<span class="kw1">next</span>.<span class="kw1">next</span><br />
&nbsp; &nbsp; <span class="kw2">return</span> <span class="kw2">true</span> <span class="kw1">if</span> <span class="br0">&#40;</span>slow == fast<span class="br0">&#41;</span><br />
&nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw2">false</span><br />
<span class="kw1">end</span></div>
</p>

<h3>More Questions</h3>
<p>
Assuming you answer the above correctly and quickly the interviewer will probably follow up with some related questions.  How do you fix the linked list when you detect a loop?  What is the linked list has multiple loops?  How do you determine the size of the loop?
</p>

<p>
I&#8217;ll leave you to ponder these questions.  Cheers!
</p>]]></content:encoded>
			<wfw:commentRss>http://20bits.com/articles/interview-questions-loops-in-linked-lists/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Interview Questions: When It&#8217;s Your Turn</title>
		<link>http://20bits.com/articles/when-its-your-turn/</link>
		<comments>http://20bits.com/articles/when-its-your-turn/#comments</comments>
		<pubDate>Sat, 12 Apr 2008 00:36:48 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[opinion]]></category>

		<guid isPermaLink="false">http://20bits.com/?p=94</guid>
		<description><![CDATA[
This is part of my series about interview questions.  As promised this is about interview strategy rather than specific technical interview questions.  I&#8217;ll continue with that next week.


Every tech interview I&#8217;ve ever had has four stages:

	Small talk and swapping brief personal bios.
	Questions about your previous employment and projects.
	Technical questions and brain teasers.
	Turning the [...]]]></description>
			<content:encoded><![CDATA[<p>
This is part of my series about <a href="http://20bits.com/tag/interview">interview questions</a>.  As promised this is about interview strategy rather than specific technical interview questions.  I&#8217;ll continue with that next week.
</p>
<p>
Every tech interview I&#8217;ve ever had has four stages:
<ol>
	<li>Small talk and swapping brief personal bios.</li>
	<li>Questions about your previous employment and projects.</li>
	<li>Technical questions and brain teasers.</li>
	<li>Turning the tables: &#8220;Do you have any questions for me?&#8221;</li>
</ol>
</p>

<p>
The meat of the interview is in the second and third parts where you can directly show your knowledge, skill, and passion, but don&#8217;t underestimate the value of the fourth part.
</p>

<h3>Don&#8217;t be Afraid to Ask Hard Questions</h3>
<p>
Most people use the fourth part to ask &#8220;What is it like working here?&#8221;-type questions.  If you think you&#8217;re going to get interesting responses by all means ask those, but most interviewers I know lie to some degree to make their job sound approximately ten times more awesome than it really is.  They probably don&#8217;t want to admit that there are parts of their job they hate to themselves, let alone some interviewee.
</p>

<p>
Besides, if you want to know the bad parts about the job &mdash; and there will be some &mdash; just ask that question directly.  They&#8217;ll either be forthright or they won&#8217;t and it&#8217;s pretty easy to discern between the two cases.
</p>

<h3>Using the Questions to Show Off</h3>
<p>
In Joel Spolsky&#8217;s article <a href="http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html">The Guerilla Guide to Interviewing</a> he says that you want to hire people who are two things: one, smart; two, able to get things done.
</p>

<p>
Since <a href="http://thedailywtf.com/Articles/Riddle-Me-An-Interview.aspx">Interview 2.0</a> is the common interview style in most technology companies these days you don&#8217;t always have the chance to show off how smart you are, but the fourth part offers a path to redemption.
</p>

<p>
Let&#8217;s say you&#8217;re interviewing at Amazon and have a background in mathematics.  You should be asking the engineers questions about the interesting mathematical things they do, have done, or have tried to do with their massive data sets.  This shows that you&#8217;re not only engaged with the interviewer and the company, but have knowledge that can be brought to bear.
</p>

<p>
The same applies if you&#8217;re a marketer or whatever.  If you feel like you haven&#8217;t had the chance to show the interviewer all you have to offer then asking intelligent questions that you know something about is a great strategy.
</p>

<h3>A Hard-Learned Lesson</h3>
<p>
I learned this lesson the hard way.  About a month after I left <a href="http://sugarinc.com">Sugar, Inc.</a> and two months after I launched <a href="http://appaholic.com">Appaholic</a> I was interviewing at Facebook.  For most of the interviews the technical/quizzy type questions went well.  I had even sent in solutions to two of their job puzzles before I came in.  
</p>
<p>
I was a little frustrated that most of the CS-type questions were about designing databases (as in, writing one from scratch) since I&#8217;d never had to do that before.  You can never know too much, though, so I only blame myself.
</p>

<p>
When it came time to ask them questions, instead of using the strategy above and showing them I did have a solid grasp of the fundamentals they were looking for, I asked them the following question: &#8220;Facebook is a _____ company.  What would you put in the blank?&#8221;
</p>

<p>
Every single person said &#8220;technology&#8221; and then I probed them about that.  &#8220;But you guys make money by selling attention.  How does that not make you a media company?&#8221;
</p>

<p>
This is a bad question to ask engineers, even high-ranking ones, because most engineers don&#8217;t give a crap &mdash; they just want to create cool products and gizmos and bristle when people interject marketing and business mumbo-jumbo.</p>

<p>And boy did they bristle.  I won&#8217;t name names, but it was clear this wasn&#8217;t a welcome question.  My time would have been better spent asking them technical questions because it would&#8217;ve created a discussion they wanted to take part in.
</p>

<p>
I thought I was being clever but instead I torpedoed my chances of getting an offer there by annoying my interviewers and reinforcing their opinions about my technical skills.
</p>

<p>
Not one month later I sold Appaholic/Adonomics, so it worked out well, but I still view it as a strategic mistake.  Lesson learned!
</p>]]></content:encoded>
			<wfw:commentRss>http://20bits.com/articles/when-its-your-turn/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Interview Questions: Shuffling an Array</title>
		<link>http://20bits.com/articles/interview-questions-shuffling-an-array/</link>
		<comments>http://20bits.com/articles/interview-questions-shuffling-an-array/#comments</comments>
		<pubDate>Mon, 07 Apr 2008 06:00:14 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[array shuffle]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://20bits.com/?p=90</guid>
		<description><![CDATA[
This is part of my interview question series.  It&#8217;s about shuffling arrays.


The Question

You have an array A of size N.  Write a routine that shuffles the array in-place.  The only restrictions are that all possible permutations of A must be possible and equally likely.



This interview question serves as a test for basic [...]]]></description>
			<content:encoded><![CDATA[<p>
This is part of my <a href="http://20bits.com/tag/interview/">interview question series</a>.  It&#8217;s about shuffling arrays.
</p>

<h3>The Question</h3>
<p>
You have an array A of size N.  Write a routine that shuffles the array in-place.  The only restrictions are that all possible permutations of A must be possible and equally likely.
</p>

<p>
This interview question serves as a test for basic algorithm construction.  There&#8217;s a canonical solution that&#8217;s not too difficult to arrive at if you&#8217;ve never seen it before, so it&#8217;s a good combination of &#8220;what do you know?&#8221; and &#8220;what can you do?&#8221;
</p>

<h3>Workin&#8217; it out</h3>
<p>
I&#8217;m going to create my solution in Ruby because that&#8217;s the language the company that asked me this question used.
</p>

<p>
The first solution most people arrive at is subtly wrong.  <a href="http://www.codinghorror.com/blog/archives/001008.html?r=31644">Jeff Atwood</a> made the mistake in his blog post.  The algorithm, in words, goes like this: iterate through each item in the array, pick another element at random, and swap the two.
</p>

<p>
In Ruby the above algorithm would look like this.

<div class="dean_ch" style="white-space: wrap;"><span class="kw1">class</span> <span class="kw3">Array</span><br />
&nbsp; <span class="kw1">def</span> shuffle_naive!<br />
&nbsp; &nbsp; n = size<br />
&nbsp; &nbsp; <span class="kw1">until</span> n == <span class="nu0">0</span><br />
&nbsp; &nbsp; &nbsp; k = <span class="kw3">rand</span><span class="br0">&#40;</span>size<span class="br0">&#41;</span> <span class="co1">#This is the line which proves our undoing</span><br />
&nbsp; &nbsp; &nbsp; n = n <span class="nu0">-1</span><br />
&nbsp; &nbsp; &nbsp; <span class="kw2">self</span><span class="br0">&#91;</span>n<span class="br0">&#93;</span>, <span class="kw2">self</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span> = <span class="kw2">self</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span>, <span class="kw2">self</span><span class="br0">&#91;</span>n<span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</p>

<p>
This solution seems correct if not optimal, but there&#8217;s a subtle problem: not all outcomes are equally likely. 
</p>

<p>
The root cause of this is because this algorithm is drawing from a sample space of size N<sup>N</sup>, while the sample space of all permutations on an N-element array is only N!.
</p>
<p>
That is, for the naive shuffle, for each of the N steps in the iteration we make one of N decisions for a total of N<sup>N</sup> possible outcomes.
</p>

<p>
But N<sup>N</sup> > N! for all N > 1 and, more importantly, N! is not a divisor of N<sup>N</sup>.  This means we&#8217;re going to prefer at least one of the permutations more than the others, so the algorithm doesn&#8217;t select among the possible permutations uniformly.  
</p>

<h3>KFC, KFY</h3>
<p>
The &#8220;best&#8221; solution is the <a href="http://en.wikipedia.org/wiki/Knuth_shuffle">Knuth-Fischer-Yates shuffle</a>.  Here it is in Ruby
<div class="dean_ch" style="white-space: wrap;"><span class="kw1">class</span> <span class="kw3">Array</span><br />
&nbsp; <span class="kw1">def</span> shuffle!<br />
&nbsp; &nbsp; n = size<br />
&nbsp; &nbsp; <span class="kw1">until</span> n == <span class="nu0">0</span><br />
&nbsp; &nbsp; &nbsp; k = <span class="kw3">rand</span><span class="br0">&#40;</span>n<span class="br0">&#41;</span> <span class="co1">#You can see I&#8217;m doing rand(n) rather than rand(size)</span><br />
&nbsp; &nbsp; &nbsp; n = n &#8211; <span class="nu0">1</span><br />
&nbsp; &nbsp; &nbsp; <span class="kw2">self</span><span class="br0">&#91;</span>n<span class="br0">&#93;</span>, <span class="kw2">self</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span> = <span class="kw2">self</span><span class="br0">&#91;</span>k<span class="br0">&#93;</span>, <span class="kw2">self</span><span class="br0">&#91;</span>n<span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">end</span><br />
&nbsp; &nbsp; <span class="kw2">self</span><br />
&nbsp; <span class="kw1">end</span><br />
<span class="kw1">end</span></div>
</p>

<p>
This works because it&#8217;s an iterative version of an essentially recursive algorithm.  If we know how to shuffle an array of size N-1 then shuffling an array of size N is easy &mdash; first shuffle the sub-array consisting of the first N-1 elements and then randomly swap in the last element to any of the N slots.
</p>

<p>
There&#8217;s a proper inductive proof in there if you&#8217;re so inclined, but it&#8217;s not particularly illuminating.
</p>

<h3>Good Questions, Bad Questions</h3>
<p>
My next article is going to be more about the interview process rather than specific questions.  One key thing to understand in an interview is what information the interviewer is looking for in asking their question.  Hint: it&#8217;s not always the answer.
</p>

<p>
Among other things they want to suss out the limits of your knowledge, how you solve problems, how quickly you resort to help, and a whole assortment of other, behavioral things, that they get because you&#8217;re right there (hopefully) engaging in a dialogue.
</p>]]></content:encoded>
			<wfw:commentRss>http://20bits.com/articles/interview-questions-shuffling-an-array/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Interview Questions: Two Bowling Balls</title>
		<link>http://20bits.com/articles/interview-questions-two-bowling-balls/</link>
		<comments>http://20bits.com/articles/interview-questions-two-bowling-balls/#comments</comments>
		<pubDate>Thu, 03 Apr 2008 06:00:01 +0000</pubDate>
		<dc:creator>Jesse</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[interview]]></category>
		<category><![CDATA[mathematics]]></category>

		<guid isPermaLink="false">http://20bits.com/?p=79</guid>
		<description><![CDATA[
This post is the first in a series I&#8217;m calling &#8220;interview questions,&#8221; where I discuss interview questions I&#8217;ve been handed in my time out here in the Bay Area.  Since I&#8217;m an engineer by trade most of the questions relate directly to technical topics.  I&#8217;ll also cover general interview strategies and advice &#8212; [...]]]></description>
			<content:encoded><![CDATA[<p>
This post is the first in a series I&#8217;m calling &#8220;interview questions,&#8221; where I discuss interview questions I&#8217;ve been handed in my time out here in the Bay Area.  Since I&#8217;m an engineer by trade most of the questions relate directly to technical topics.  I&#8217;ll also cover general interview strategies and advice &mdash; probably by serving myself up as an example of what <em>not</em> to do in an interview.
</p>

<p>
I know people keep a repertoire of interview questions at hand, so I&#8217;m not going to name names when discussing the questions.  Anyhow, let&#8217;s get started!
</p>

<h3>The Question</h3>
<p>
You&#8217;re standing in front of a 100 story building with two identical bowling balls.  You&#8217;ve been tasked with testing the bowling balls&#8217; resilience.  The building has a stairwell with a window at each story from which you can (conveniently) drop bowling balls.  
</p>

<p>
To test the bowling balls you need to find the first floor at which they break.  It might be the 100th floor or it might be the 50th floor, but if it breaks somewhere in the middle you know it will break at every floor above.
</p>

<p>
Devise an algorithm which guarantees you&#8217;ll find the first floor at which one of your bowling balls will break.  You&#8217;re graded on your algorithm&#8217;s worst-case running time.
</p>

<p>
<h3>Warning: Stop reading here if you&#8217;re not interested in seeing any of my solutions!</h3>
</p>

<h3>A Few Preliminaries</h3>
<p>
The original problem stated that the building had 100 floors, but it may as well have N floors.  Using N rather than 100 will make it easier to quantify the performance of the algorithm, so that&#8217;s what I&#8217;m going to do.
</p>
<h3>Solution 1: The Naïve Solution</h3>
<p>
Ok, there&#8217;s one blindingly obvious solution: take one of the bowling balls and drop it from every floor, starting from the first.  At worst this will take N tries, where N is the number of stories on the building.
</p>
<p>
<strong>Interview Advice:</strong>: In an actual interview situation don&#8217;t be afraid to say the obvious solution, even if you know there&#8217;s a better one.  Problem solving is iterative and your answer should be, too.
</p>

<h3>Solution 2: Two Bowling Balls</h3>
<p>
We know the first solution is probably sub-optimal because it doesn&#8217;t make use of both bowling balls.  To give us some ideas let&#8217;s just pick a floor, say the 50th floor, and drop one of the balls &mdash; we has nothing to lose since we know we can do it with only one.
</p>

<p>
If our building is 100 floors and we dropped one of the balls from the 50th floor one of two thing will happen: the ball will either break or it won&#8217;t.  If it breaks then we know the floor we&#8217;re looking for is somewhere between floors 1-49.  If it doesn&#8217;t then we know it&#8217;s somewhere between floors 51-100.  In either case we&#8217;ve halved the size of the search space and now need at most N/2 (or 50) tries.</p>

<p>
But 50 was arbitrary.  What about other numbers?  What happens if we drop the ball on the third floor?  If it breaks then we can use the second ball to test floors 1-2, taking at most 3 tries.  If it doesn&#8217;t break then we try the same experiment again, dropping the ball from another floor.
</p>

<p>
Here&#8217;s one possible strategy: pick a number S and call it the skip number.  We drop one ball every S floors until it breaks on the k<sup>th</sup> try.  We then use the second ball to try every floor between floors (k-1)*S and k*S.
</p>

<p>
As an example, let N=100 and S=4.  We&#8217;d try floors 4,8,12,16,&#8230; with one bowling ball until it breaks.  Let&#8217;s say it breaks on the 60th floor.  Since it didn&#8217;t break on the 56th floor we know the culprit is somewhere on floor 57, 58, 59, and we can use the second ball to test those floor one at a time using the naïve strategy.
</p>

<p>
What is the best skip size?  Obviously S=100 isn&#8217;t ideal since that is equivalent to the naïve strategy, as is S=1.  But we know both S=50 and S=4 are better, so there must be an optimal strategy somewhere between.  To find this strategy let L(S) be the number of drops requires in the worst-case scenario for a skip number of S.  If you work it out you&#8217;ll get <img class="math" src="http://20bits.com/wp-content/uploads/2008/04/latex-4.png" alt="" title="latex" width="163" height="43" />
</p>

<p>
We want to minimize this function.  Bringing back our high school calculus, the derivative of L(S) is
<img src="http://20bits.com/wp-content/uploads/2008/04/latex-1.png" alt="" title="latex-1" class="math" />
</p>

<p>
Setting the derivative equal to zero implies <img src="http://20bits.com/wp-content/uploads/2008/04/latex-2.png" alt="" title="latex-2" width="78" height="23" class="math" />
</p>

<p>
For N=100 this gives an optima skip of S=10.  If N isn&#8217;t a perfect square you&#8217;ll have to work out which skip gives the &#8220;correct&#8221; solution.
</p>

<h3>Solution 3: You can do better&#8230;</h3>
<p>
At this point in the interview you&#8217;re probably pretty happy with yourself.  The above took you a few minutes to work out, perhaps with some prodding by the interviewer.  But then you hear that dreaded question, &#8220;Can you do any better?&#8221;
</p>

<p>
The interviewer isn&#8217;t a jackass, though, and gives you a hint.  He points out that it seems like we should be able to find a solution that works equally well irrespective of where the bad floor is.  That is, it should take the same number of turns if its on the 100th floor as it would if it were on a lower floor.
</p>

<p>
We have a baseline for ourselves.  For N=100 and S=10 we know we can do it in at most 19 turns.  This can act as a sort of counter &mdash; if we beat this number at every step we&#8217;ve come up with a strictly better algorithm.  So, at every step, we want to be able to find the floor in question in no more than 18 steps.
</p>
<p>
Let&#8217;s start by dropping the first ball on the 18th floor.  If it breaks we can test floors 1-17 with the second ball, taking at most 18 turns.  If it doesn&#8217;t break, we&#8217;ve used up one of our turns, leaving us with 17 turns left.
</p>
<p>
So, the next floor we should test is 18+17, or the 35th floor.  If it breaks we can test floors 19-35, taking at most 18 turns.  We can continue this way, shrinking the step size by one each time.  Now we know we can do it in at least 18 steps.
</p>

<p>
But why not 17?  If we repeat the above steps, starting with a counter of 17 rather than 18, we get an algorithm that takes at most 16 steps.  Then, using 16 as a counter, we get an algorithm that takes at most 15 steps.  We can&#8217;t do this forever, since there&#8217;s no possible algorithm that takes at most one step.  So where is the end of the line?
</p>

<p>
The problem is that for this algorithm to work the first ball needs to be able to skip one fewer each time and still cover all 100 floors.  If we set our counter to C that means we must have 1+2+&#8230;+C > 100.</p>

<p>
Here&#8217;s the math:<img src="http://20bits.com/wp-content/uploads/2008/04/latex-12.png" alt="" title="latex-12" width="357" height="150" class="math" />
</p>

<p>
Using the quadratic formula to find the exact solution and then taking into account the fact that we want an integer solution gives
</p>
<img src="http://20bits.com/wp-content/uploads/2008/04/latex-21.png" alt="" title="latex-21" width="215" height="53" class="math" />
<p>
as the worst possible case for our third strategy.  <tt>L(100) = 14</tt>, which checks out.
</p>

<p>
That&#8217;s the best solution I know, and it was the best solution the interviewer knew, too.  Can you do any better?
</p>

<h3>After The Interview</h3>
<p>
This was one of a few questions I was asked by one of four interviewers.  I worked through the problem above, basically as it was written out, albeit with more digressions.  How did the interview wind up going?  I wasn&#8217;t offered a job.  At least I got a good interview question out of it, though.
</p>]]></content:encoded>
			<wfw:commentRss>http://20bits.com/articles/interview-questions-two-bowling-balls/feed/</wfw:commentRss>
		<slash:comments>83</slash:comments>
		</item>
	</channel>
</rss>
