May 5th

Notification Strategies for Social Networks

You’ve built a social application and launched a new feature. The number of notifications you can send out is constrained. Which set of users should you notify to guarantee the most people start using this new feature?

This problem might seem artificial. Why not put up an ad on your product, or send a notification to every single person who might be interested? There are several reasons the number of people you can notify might be constrained.

  1. There is a technical constraint, e.g., Facebook limits the number of application-to-user notifications at the API level.
  2. There is a financial constraint, e.g., you’re sending notifications over SMS and every message costs you money.
  3. There is a strategic constraint, e.g., sending notifications too frequently causes fatigue and reduces the effectiveness of future notifications.

So, the situation is not too far fetched. Let’s investigate the issue.

For the rest of the article the “application” is going to be a Facebook application and it can only send 100 application-to-user notifications per week. Which 100 users should we notify?

The Basic Considerations

In the linear cascade model when a user in a social network adopts a new behavior there is a probability that each neighbor in the network will adopt it.

Under this model we probably wouldn’t want to notify two people who are friends, and especially not a cluster of friends or a clique. The new feature wouldn’t spread very far beyond this group.

Likewise, we wouldn’t want to notify people who are very far apart on the social network because a user is more likely to adopt a behavior if more than one of his friends has also adopted it. So there is a balancing act between notifying users who are close together, to achieve density, and notifying users who are far apart, to achieve breadth.

Heuristics and Centrality Measures

The easiest solution is to pick 100 random users to notify, but this is also the most naive since it takes into account neither the structure of the network nor likelihood that a person will influence their neighbors.

A better1 solution to this problem is to develop a heuristic that ranks every user in the network according to some metric. If we can only send 100 notifications then they are sent to the first 100 people on this ranked list.

The idea here is to use centrality measures to come up with heuristics. In graph theory “centrality” is a measure of how important an individual node is.

The simplest measure is called “degree centrality” and is equal to the number of neighbors of a node. On a social network this is the number of friends of a given user. So, if you wanted to send out 100 notifications using this heuristic we’d send notifications to the 100 users with the most friends. This heuristic involves convincing celebrities to use the new feature.

There are other, more complex heuristics. The Wikipedia article linked above has a list of other centrality measures, and I wrote an article about calculating eigenvalue centrality, which is similar to PageRank. Each of these admits a heuristic which can tell us which users to notify.

Of course, which strategy works best is hard to know beforehand, as it varies with respect to both time and the underlying notification. A/B testing this is difficult because the effects are intentionally dependent. If anyone has a good solution to this that doesn’t involve collecting massive amounts of data about user behavior I’d be interested in hearing it!

It should be noted that each of these heuristics only takes into account the underlying structure of the graph and not the probability of “infection.” By including the latter we can come up with a nearly exact model of the optimal subset of users to notify.

A Global Solution

Brendan Meeder at CMU pointed me to a great paper that discuss this very topic, Maximizing the Spread of Influence through a Social Network by Kempe, et al.

Rather than take a localized view of the problem by ranking each node individually, we create a statistical model of how the new feature propagates through the network.

First, we start with a finite seed set, A. In our case A is a set of 100 users. Say we convert each of these 100 users.

In our model if a user u is converted then for each neighbor v there is some probability

p_{u,v}

that v will also be converted.

After the process has run its course some set of users has adopted the new feature. Because adoption is probabilistic the size of this final configuration is a random variable. Using the notation from Kempe, et al., for a given seed set A the size of the final set of adopters is a random variable denoted by

\displaystyle{\varphi\left(A\right)}

Our goal is to pick the set A which maximizes the expected value of this random variable.

Formally, we want to find the subset A such that

\displaystyle{\sigma\left(A\right) = E\left[\varphi\left(A\right)\right]}

is maximized, where

\displaystyle{\sigma\left(\cdot\right)}

is called the influence function.

The Algorithm and The Results

It turns out that calculating the influence function exactly is NP-hard, but there is a greedy algorithm which approximates the value under certain (unrestrictive) conditions.

If you want more details read the paper linked above or the related Influential Nodes in a Diffusion Model for Social Networks by Kempe, et al.

Using Monte Carlo methods Kempe, et al. simulated the diffusion process using this algorithm versus several of the heuristics I described above. The results are fairly striking: their algorithm performs at least 18% beter than the best-performing heuristic (degree centrality) and 48% better than if the seed set were randomly selected. I’ve embedded a graph of their results below.

kempe-graph

The “target set” is the initial seed set of users to notify, and the “active set” is the final set of users who actually adopted the new feature or product. The more users who adopt the feature the better the strategy.

Feasibility

Kempe’s algorithm is more feasible than many of the heuristics discussed above, although the best performing heuristic — degree centrality — is also the easiest to calculate. He also doesn’t include eigenvalue centrality in his analysis, which I’d be interested in comparing.

The biggest downside to his algorithm is that it requires both full knowledge of the underlying graph and an accounting of all the user-to-user transmission probabilities. Modeling these probabilities would require a lot of data about users over an extended period of time.

Whether the additional 18% is worth the extra computation and data collection depends on a lot on specific circumstances, but personally I’m going to try to implement it in my projects and see how the performance compares first-hand.

  1. “Better” according to what? As we’ll see, randomly selecting seed users performs worse than all the other heuristics. []
  • As far as alternative heuristics to determining individual user's influence, I'm reminded of Malcolm Gladwell's "The Tipping Point", when he's describing the three archetypes of users: connectors, mavens, and salesmen. "Degree centrality" would seem to describe the main asset of connectors; I'd be interested to see if there are any good heuristics to find the other two types of influencers—the people with less friends, but bigger influence. After all, we all have app whores in our network, and we probably tend to tune them out over time.

    Great read, by the way :)
  • Hey Matt,

    Glad you liked it and thanks so much for commenting.

    A more general notion of influence is captured by eigenvalue centrality, which is similar to PageRank — every person's influence is proportional to the sum of their follower's influence. This sets up an eigenvalue problem, whose principal eigenvector contains the "influence" of each of the nodes.

    The nice thing about Kempe's algorithm is that it sidesteps all these issues and just asks the question directly: if we influence this user, how many other people can we expect to be influenced as a result of the cascade? That seems like a pretty objective measure of influence, although it's computationally difficult.

    The downside to both of these is that they require extensive knowledge about the underlying graph and diffusion processes, which we don't always have access to. I've sent an email to Kempe asking about this. I hope he responds!
  • Sagar Mehta
    You might be interested in the CASCADES project at http://www.cs.cmu.edu/~jure/blogs/

    and some of the papers at http://www.cs.cmu.edu/~jure/research.html
  • Sagar,

    Thanks for commenting. That does look interesting!

    It looks like their project is mostly about instrumenting a network to detect cascades. Is that right?
  • Sagar Mehta
    not really .. it is more about modelling real world graphs like say social networks using Kronecker multiplications - http://videolectures.net/icml07_leskovec_smrg/

    one of the problems they are trying to solve is how information propogates over social networks/ blogs ? - for eg - consider the blogosphere as a directed graph - with blogs citing other blogs - given this which top N blogs should you read to keep yourselves abreast of information in a particular area

    http://www.cs.cmu.edu/%7Ejure/pubs/blogs-sdm07.pdf


    another sub topic is how does influence propogate over social networks - say you can give a sample of your product to only N people - which of them should you target given the social graph and influence of one node on the other

    Hope this helps !

    Cheers,
    Sagar
  • Hey Jesse! I´m happy that I randomly ran into you when my teacher recommended your blog. Although he was refering to one of your old articles.. anyhow! your passion is affecting my studies in a great way. thank youuu
  • Nishant
    Hi Jesse,
    I just came across your site while I was searching for more details on the influence function. You have mentioned that "calculating the influence function exactly is NP-hard" - where did you come to that conclusion from? The Kempe paper you mentioned (ICALP '05) specifies that *maximizing* the influence function is NP-hard and they came up with a greedy strategy to approximate it.
    Could you please clarify?
    Thanks!
  • Nishant,

    You're correct!

    My sentence doesn't even make much sense, IMO. What is "calculating the influence function exactly?" Do I mean determining the value of σ(A) for all A? Do I mean given a subset, A, calculating σ(A)?

    I don't remember what I was thinking when I wrote it, honestly. :)
blog comments powered by Disqus