Notification Strategies for Social Networks
You've built a social application and launched a new feature. The number of notifications you can send out is constrained. Which set of users should you notify to guarantee the most people start using this new feature?
This problem might seem artificial. Why not put up an ad on your product, or send a notification to every single person who might be interested? There are several reasons the number of people you can notify might be constrained.
- There is a technical constraint, e.g., Facebook limits the number of application-to-user notifications at the API level.
- There is a financial constraint, e.g., you're sending notifications over SMS and every message costs you money.
- There is a strategic constraint, e.g., sending notifications too frequently causes fatigue and reduces the effectiveness of future notifications.
So, the situation is not too far fetched. Let's investigate the issue.
For the rest of the article the "application" is going to be a Facebook application and it can only send 100 application-to-user notifications per week. Which 100 users should we notify?
The Basic Considerations
In the linear cascade model when a user in a social network adopts a new behavior there is a probability that each neighbor in the network will adopt it.
Under this model we probably wouldn't want to notify two people who are friends, and especially not a cluster of friends or a clique. The new feature wouldn't spread very far beyond this group.
Likewise, we wouldn't want to notify people who are very far apart on the social network because a user is more likely to adopt a behavior if more than one of his friends has also adopted it. So there is a balancing act between notifying users who are close together, to achieve density, and notifying users who are far apart, to achieve breadth.
Heuristics and Centrality Measures
The easiest solution is to pick 100 random users to notify, but this is also the most naive since it takes into account neither the structure of the network nor likelihood that a person will influence their neighbors.
A better"Better" according to what? As we'll see, randomly selecting seed users performs worse than all the other heuristics. solution to this problem is to develop a heuristic that ranks every user in the network according to some metric. If we can only send 100 notifications then they are sent to the first 100 people on this ranked list.
The idea here is to use centrality measures to come up with heuristics. In graph theory "centrality" is a measure of how important an individual node is.
The simplest measure is called "degree centrality" and is equal to the number of neighbors of a node. On a social network this is the number of friends of a given user. So, if you wanted to send out 100 notifications using this heuristic we'd send notifications to the 100 users with the most friends. This heuristic involves convincing celebrities to use the new feature.
There are other, more complex heuristics. The Wikipedia article linked above has a list of other centrality measures, and I wrote an article about calculating eigenvalue centrality, which is similar to PageRank. Each of these admits a heuristic which can tell us which users to notify.
Of course, which strategy works best is hard to know beforehand, as it varies with respect to both time and the underlying notification. A/B testing this is difficult because the effects are intentionally dependent. If anyone has a good solution to this that doesn't involve collecting massive amounts of data about user behavior I'd be interested in hearing it!
It should be noted that each of these heuristics only takes into account the underlying structure of the graph and not the probability of "infection." By including the latter we can come up with a nearly exact model of the optimal subset of users to notify.
A Global Solution
Brendan Meeder at CMU pointed me to a great paper that discuss this very topic, Maximizing the Spread of Inï¬‚uence through a Social Network by Kempe, et al.
Rather than take a localized view of the problem by ranking each node individually, we create a statistical model of how the new feature propagates through the network.
First, we start with a finite seed set, A. In our case A is a set of 100 users. Say we convert each of these 100 users.
In our model if a user u is converted then for each neighbor v there is some probability
that v will also be converted.
After the process has run its course some set of users has adopted the new feature. Because adoption is probabilistic the size of this final configuration is a random variable. Using the notation from Kempe, et al., for a given seed set A the size of the final set of adopters is a random variable denoted by
Our goal is to pick the set A which maximizes the expected value of this random variable.
Formally, we want to find the subset A such that
is maximized, where
is called the influence function.
The Algorithm and The Results
It turns out that calculating the influence function exactly is NP-hard, but there is a greedy algorithm which approximates the value under certain (unrestrictive) conditions.
If you want more details read the paper linked above or the related Inï¬‚uential Nodes in a Diï¬€usion Model for Social Networks by Kempe, et al.
Using Monte Carlo methods Kempe, et al. simulated the diffusion process using this algorithm versus several of the heuristics I described above. The results are fairly striking: their algorithm performs at least 18% beter than the best-performing heuristic (degree centrality) and 48% better than if the seed set were randomly selected. I've embedded a graph of their results below.
The "target set" is the initial seed set of users to notify, and the "active set" is the final set of users who actually adopted the new feature or product. The more users who adopt the feature the better the strategy.
Kempe's algorithm is more feasible than many of the heuristics discussed above, although the best performing heuristic — degree centrality — is also the easiest to calculate. He also doesn't include eigenvalue centrality in his analysis, which I'd be interested in comparing.
The biggest downside to his algorithm is that it requires both full knowledge of the underlying graph and an accounting of all the user-to-user transmission probabilities. Modeling these probabilities would require a lot of data about users over an extended period of time.
Whether the additional 18% is worth the extra computation and data collection depends on a lot on specific circumstances, but personally I'm going to try to implement it in my projects and see how the performance compares first-hand.