Interview Questions: Counting Bits

by Jesse Farmer on Wednesday, April 30, 2008

Continuing my series of interview questions, today I bring you the classic bit-counting problem.

The setup usually goes something like this. We're receiving gigabytes of data per second. Each chunk of data comes with a header that contains an unsigned 32-bit integer. Let's call that integer the routing number. We choose the routing destination based on the number of on bits in the binary representation of the routing number.

Write a routine that returns the number of on bits in the binary representation of an unsigned 32-bit integer in C.

The Naive Solution

As usual there's a naive solution. In this case you could loop through each bit at a time, counting the number of ones.

int bitcount(unsigned int n) {
	int count = 0;    
	while (n) {
		count += n & 0x1u;
		n >>= 1;
	}
	return count;
}

>> is the right bit-shift operator. It drops the right-most bit from the binary representation of an integer. So, 0x1001 >> 1 is equal to 0x0100.

The above has a few issues. First, it takes O(n) time, where n is the length of the binary representation of the integer. Can we do better? Second, it doesn't take into account the fact that n is a 32-bit integerLet's ignore the subtleties of integer types in C for now, ok?.

Pre-computation

Since speed was a requirement something that takes linear time is probably a bad idea. The key idea is to realize that a deterministic function, like bitcount, is no different than a hash where the keys are the inputs to the function and the values are the output of the function.

This is principle behind memoization, for example, but here we're sitting pretty. Since both the input and output are unsigned integers we can create a regular array, call it bit_table, where bit_table[i] is the number of on bits in the binary representation of i.

Furthermore since we have the constraint that the integer is 32-bits we can, in theory, pre-compute the entirety of bit_table and include it in a header. It'd work like this:

// Pre-compute this elsewhere and put it here.
static unsigned int bit_table32[0x1u << 32];

int bitcount_32(unsigned int n) {
	return bit_table32[n & 0xFFFFFFFFu];
}

Size Constraints

bit_table32 is going to contain 4,294,967,296 integers. Depending on the size of an integer on your platform this will probably take up several gigabytes of memory. If we want a constant-time algorithm that takes up significantly less memory we can create a 16-bit table and use bit arithmetic.

// Pre-compute this elsewhere and put it here.
static unsigned int bit_table16[0x1u << 16];

// This only works for 32-bit integers but takes constant time.
int bitcount_32(unsigned int n) {
	return bit_table16[n & 0xFFFFu] + bit_table16[(n >> 16) & 0xFFFFu];
}

The Unrestricted Case

If we don't know how many bits the integer will contain (say we moved from a 32-bit to a 64-bit platform) then we can iterate over the binary representation 16 bits at a time, using the pre-computed table at each stepFor the hard-core bit-counters out there, the C specification requires that integers contain at least 16 bits..

// Pre-compute this elsewhere and put it here.
static unsigned int bit_table16[0x1u << 16];

// This works for any sized integer but no longer takes constant time.
int bitcount(unsigned int n) {
	int count = 0;
	while (n) {
		count += bit_table16(n & 0xFFFFu);
		n >>= 16;
	}
	
	return count;
}

Good or Bad Question?

This question suffers from the same problems that the reversing linked lists in that you probably either know the solution or you don't.

That said, the solution here — pre-computing a list of values to CPU time — is much more common than the tortoise and hare solution in the previous question, so the likelihood of it dawning on you during the interview is that much greater. Plus I've been asked this question so many times that it's one of those must-know exercises, in my opinion, even if the question itself could be better.