Decisions Without Data
If you've ever worked on a project where you have to build something, be it software or anything else, you've seen it happen — people, especially designers and engineers, argue over the most petty stuff.
You know what kind of argument I mean. How many widgets should we let people enter at a time? Should we use horizontal or vertical navigation? Everyone knows they have "the" optimal answer and the situation quickly devolves into a game of verbal chicken, where the first one to realize it's a stupid argument loses.
Having seen this over and over at all levels of decision-making I've found a sentiment that stops the situation from devolving. It goes like this: decisions without data are guesses.
Whenever one of these decisions pops up I ask three questions.
- What is the goal?
- What metrics tell us whether we're closer or farther from the goal?
- What data have we collected and what data do we need?
Example: Publisher Ad Choices
In the world of Facebook applications most developers, when it comes to making money, are members of the Ron Popeil school of business: set it and forget it. If you browse the forums for ad-related topics there are a few questions that recur over and over. What ad network is best? Where should I put ads on my application? What color scheme should my ads have?
In this case, the developers implicitly have a goal of making money in mind. The two key metrics are total pageviews and revenue per pageview (RPM). The data needed to calculate these metrics are total pageviews, which you can get by using Google Analytics, and revenue, which every ad network reports directly.
So, let's take the first question, which ad network should I use? I might think AdBlade is the best and have ten stories that back up my claim, while you might think RockYou is the best from your own experiences. We could go back and forth all day, but there's only one correct answer: the best ad network is the one that, for a given level of traffic, offers the highest RPM.
You can measure this by using A/B testing across multiple ad networks. Once you've collected information about how each ad network performs there's no room for arguments backed by anecdotes. The best choice is right there in the numbers.
Example: Website Layout
These arguments crop up all the time when talking about website design. Let's say you're creating the latest and greatest social networking site. What should the homepage look like?
First, you need to settle on a goal for what the homepage is supposed to do. Do you want lots of users to sign up? Do you want lots of users of a certain type (e.g., more engaged users, only women, etc.) to sign up? Let's say you just want as many people to sign up as possible.
The metric you're probably interested in in this case is the percentage of people who visit the homepage and then go through the signup process. Measuring this requires that you track a user through multiple parts of the site and identify which ones sign up and which ones don't. You can do this by assigning the potential user a unique identifier and persisting it through the entire signup process.
Now you know for a given homepage layout what percentage of users sign up. What homepage design is the best? Should we use 14-point or 16-point headers? Should be use an off-white or grey background? Depending on the granularity of the design elements you want to test you can do this either through A/B testing, as in the previous example, or through a more complex multivariate testing scheme.
People love arguing about what designs are "best," but this process forces you to ask, "Best for what?" If our goal is to get signups then the best design is the one that produces the most signups and that's something we can measure directly. Once we've done that not only is there no more room for silly arguments but metrics might reveal that both opinions were wrong.
Conclusions
Maybe it's my background as a scientist and mathematician, but I treat website development and design as an empirical venture. We should come to the task with a definite idea of what we're trying to achieve and at every step make the decision that the data says is best.
Not only does this produce better and more justifiable decisions but it prevents time-wasting arguments. If someone comes at you with an opinion you can just shoot back, "What data do you have?" If they have nothing but opinions and anecdotes you know they're not making a decision, they're guessing.