Random Weighted Elements in PHP
I was writing some code for Adonomics when I came across the need to select a random, weighted element from an array. I looked around the web but all the solutions I found were cumbersome or involved creating an auxiliary array whose size was proportional to the sum of the weights. Unacceptable!
The Math
So, of course, I rolled my own. Here’s the idea:
Given an array of values
and and array of weights
such that
write a function which takes as its input W and produces the ith key with probability wi.
That is, define a function f such that
I’m going to call our little function w_rand, which looks like PHPese to me.
As an example we might have
and
w_rand(W) should return 0 (the first index) with a 15% probability, 1 with a 35% probability, 2 with a 20% probability, etc.
The Code
Let’s get down to bussiness!
$r = mt_rand(1,1000);
$offset = 0;
foreach ($weights as $k => $w) {
$offset += $w*1000;
if ($r <= $offset) {
return $k;
}
}
}
I like this function because it only makes one random call and doesn’t require any additional memory beyond what the array of weights takes.
The Details
How does this function work? Many other such functions work by creating an auxiliary array and filling it with the same elements in proportion to their probability.
For example, if you had an array A = [a, b] and weights W = [0.30, 0.70], it would create an auxiliary array [a, a, a, b, b, b, b, b, b, b]. You then select an element randomly from this array, and that gets you what you’re looking for.
The downside to this is that this requires both more processing power. You also shouldn’t need to care about the values — what if the values are huge strings?
Instead, imagine a number line from 1 to 100. We pick a random number in that range. Using the example W = [0.30, 0.70], we’d want to return 0 if our random number is between 1-30 and 1 if our random number is between 31-100.
See? No extra memory required. The above case, with just two weights, is easy enough to do with a single if-else statement, but what about the case where you have three, four, or more weights?
Take W = [0.30, 0.20, 0.50]. We pick a random integer from 1-100. If the number is between 1-30 we want to return 0, if it between 31-50 then return 1, and if its between 51-100 return 2.
It’s easy to make this into a loop: keep a running total T of the weights so far and return the first key such that N ≤ T.
Why does this work? Slice up the number line, 1-100, according to the weights. If our random number N is in the first slice return that key. If our random number is in the first or second slice (i.e., is less than the sum of the first and second weights) then it must be in the second slice since we already know it’s not in the first. Continue this way until you reach the end.
Ta da! I hope someone finds this useful.