Designing Content-focused Websites

by Jesse Farmer on Sunday, June 29, 2008

Every website has two fundamental components: data and one or more users/readers who consume that data. This data can be produced by many ways — an author or editorial staff, other users of the website, a database, etc. I'm not interested in the question of what data a user is interested in consuming. That is, I'm not interested in giving editorial advice for someone looking to create a popular blog.

Rather, given that a user is at a website which has data they want to consume, I'm interested in the question of how best to deliver that data. This question intersects the realms of technology, usability, and design.

In thinking about this question I've come up with three categories into which most any website fits. By analyzing these categories I believe one can arrive at some solid, general advice for how to structure websites. Some might accuse me of being "too academic," but I think there's something to be learned about designing websites by understanding these categories and your website's relation to them.

Contents

  1. The Three Categories
  2. Content-focused Websites
  3. Surfacing Content
  4. Choosing and Estimate
  5. Conclusions

The Three Categories

Application-focused

Application-focused websites are those which enable the user to complete some specific task. The primary question to ask of one of these websites is "How well does it work?" They have little user-user interaction and often no author per se.

Most of Google's websites fall into this category, for example. Google's base business is centered around aggregating and organizing information. A more pedestrian example would be a website which helps you complete your annual tax returns or find tickets for nearby concerts.

Content-focused

Content-focused websites are those which provide regularly updated topical content. The primary question to ask of one of these websites is "What information does it provide?" There will always be at least one author and there might be an extensive degree of user-user interaction, but this interaction is always subordinate to the content.

Blogs and other news-oriented websites, including online magazines and newspapers, fall into this category. Wikipedia is also an example, albeit one where the line between "readership" and "authorship" is blurred. This is why the categories are defined in terms of data/user interaction rather than author/user interaction. However, Wikipedia would be no less a website if it had the same content it currently has but were only authored by, say, a certified editorial staff. In other words, it is the content that matters, not the means by which the content is generated.

Another example is Livejournal, which allows user-user interaction in comments, groups, and via its "friends" feature (which is really a subscription feature in disguise). User-user interaction is not the primary focus of LJ, however, and it is generally only used as a way to surface interesting content.

User-focused

User-focused websites are those which are based upon user-user interaction. The primary question to ask of one of these websites is "Who is using this website?" There might be topical content or searchable data, but this is incidental to the relationships between users.

Most social networks, like Facebook, Yahoo 360º, Friendster, and MySpace, fall into this category. Nobody would use Facebook for photo sharing or storing contact information were it not for the fact that all their friends are using it, too. MySpace was originally a content-focused website, centering around bands and their music, but has since evolved (some might say degenerated) into a user-focused website where most people just use it as a platform to promote their own personality to other users of MySpace.

I don't intend for these categories to be absolute, but rather just a useful tool for reasoning about websites and website design. If you can think of any websites which do not fall into any of the above categories I'd love to hearSome websites themselves are the content, e.g., an art student's website in which the piece of art is the website. As far as I'm concerned these are one-off affairs with no unifying logic outside of the usual artistic conventions..

Content-focused Websites

So, you have a blog and you're writing interesting stuff that has an audience. This in itself is no small feat, but arguably the harder part is knowing how to present that information so that any given reader gets information he wants to read, even if they didn't know they wanted to read it before coming to your site. This applies to any content-focused website. How do I give the reader the most relevant and interesting content with the least amount of effort on their part?

Many content-focused websites don't even have real registration, e.g., wordpress blogs where registering doesn't actually confer any additional benefits. How are the authors of the content supposed to serve up interesting content if they don't know anything about an individual reader's preferences? And that's the key to designing a good content-focused website — can you come up with a way to estimate your readers' preferences? If yes then you just serve up content according to that estimate.

Surfacing Content

Let's assume that you're an author of a content-focused website (a blog, say) and write quality content which has an audience. For your website the data consists in a collection of posts and your job, given that people are actually interested in what you have to write, is to surface the content which is most interesting to a given reader. There are three ways which you can do that.

  1. Global Preference Estimation

    Global preference estimation is the idea that if you know nothing about a specific reader your best estimate is the average case. If your article about Widgets has been read more than any other article then it's not a bad bet that the average reader would also find it worth reading, for example. Here are some ways to estimate global preferences, with explanations where necessary.

    • Recency
    • Pageviews
    • Number or recency of comments
    • Number of inbound links
    • In general if your site has a feature which requires readers to take a definite action on a post, e.g., commenting, viewing, emailing, etc., then you can measure preferences by the numbers of times a post has been acted upon.
    • Featured articles — if you have a good understanding of your audience explicitly surface content you believe they'd find interesting.
    • Average post rating, if your website supports ratings.
    • A promotion model ("Promoted Articles") based on explicit votes (X votes marks a story as promoted) or votes over time (a la digg).

    The pro of global preference estimation is that is it relatively easy to implement and does not suffer from sparsity problems. That is, a specific user does not need to register all their preferences for it return good results. Instead preferences are collected in aggregate so that one reader's habits are as good as any other's from the perspective of a global estimate. The con is that this estimate only deals with averages. At best this will let you please most of the people some of the time.

  2. Local Preference Estimation

    Local preference estimation is based on implicit and explicit information you have about a specific reader, such as their reading, browsing, and commenting patterns. If you can collect enough data you can surface content that often the user doesn't even realize they were looking for.

    The easiest way to get a local preference estimation is to use the most obvious fact about a reader — you know when they are reading something. It's a fairly safe assumption that the reader is interested in whatever they are reading, so it stands to reason they would also be interested in related content. Coming up with a way to surface related content is therefore one of the first things a content-focused website should implement, in my opinion.

    For sites on which the readers are creating the content another way to measure interest is to allow readers to befriend each other. Since this friendship is essentially arbitrary they will take "friendship" to mean whatever you tell them it means. If you use friendship status as a means to surface interesting content then they will befriend people creating interesting content. That suggests presenting the user with the following:

    • Content created by my friends
    • Content commented on by my friends
    • Content voted on by my friends
    • Content read by my friends
    • etc.

    If you want to get very fancy (and very technical) you can create a content recommendation system. Reader A like stories 1, 3, and 5. Reader B likes stories 3, 5, and 7. It's probable that Reader A would like story 7 and Reader B would like story 1. Techniques for content recommendation dive straight into the fields of information retrieval and data mining. Given other local preference estimates you can come up with what it means for a reader to "like" some piece of content. You register their preference and then use standard IR and data-mining techniquesFor example, slope one recommenders or clustering recommenders based on similarity metrics, such as cosine similarity. to extract patterns about their tastes. This really only works well if you have a lot of diffuse content and a large, active readership.

    The upside of local preference estimation is that it can give fairly accurate results. Google, for example, bases much of their business around contextual information. If you have a Google account they know your searching habits and what Google ads you're seeing around the web. From this, in turn, they can recommend to you all sorts of things. The con is that to get accurate results you need a lot of data. Google and Yahoo! can pull it off because they have terabytes upon terabytes of data. The average blog, however, will have a harder time.

  3. Explicit Preferences

    Explicit preferences are just that, preferences which the user has made known or wants to make known. To accommodate these preferences it is best for the website to simply get out of the readers way. Here, search is king.

    Let's say the user remember an old post you wrote on your blog about Widgets, but can't remember the exact title or some of the secondary content. The first thing they will probably want to do is search for "Widget." Search isn't easy (otherwise Google wouldn't be a multi-billion dollar company), so it's not uncommon to leave search up to a third-party application. For this blog I trust Google to index it and for my readers to use Google to search it — I know Google will do a better job than any native Wordpress search functionality would.

    Aside from search a common feature in the Web 2.0 world, at least, is the tag cloud. If you tag your content with semantically meaningful tags then the tag cloud provides a sort of topographical map of your content. Presumably you tagged that post about Widgets with "widget," so a user looking for some post on Widgets will be able to find it by looking through all Widget-related content.

Choosing an Estimate

For content-focused sites with worthwhile content the most important job is to surface the most interesting content. What constitutes "interesting" varies from site-to-site and audience-to-audience but abstractly speaking the process is the same. That is, you need to come up with some way to measure how interesting a given piece of content is and display views of your content ranked according to that measure.

For example, recency is going to be an important component of what is interesting on a news-focused site, but is hardly sufficient. The news that Grandma Smith died just isn't as interesting as the news that a Presidential candidate was caught doing drugs, for example. Traditional news outlets use editorial discretion to surface the interesting news. Good editorial staffs lead to successful newspapers.

The internet, however, affords more direct access to your audience's tastes. Sites like digg exploit this by allowing users to vote directly on articles. The measure of how "interesting" content is then a function of both the recency of the content and the number of votes. The only essential difference between a site like digg and a traditional news blog is the way in which they measure how interesting a given piece of content is.

What measures work depends heavily on both the content and audience, however. A new measure might make for a novel kind of content-focused website but it is no guarantee that that website will be successful, even if the content has an audience. The mechanics of the metric might not sit well with your audience. For example they might not understand a digg-like voting mechanism, making any metric based on "votes" totally ineffective.

So the problem for a would-be website author is two-fold: create quality content that has an audience and determine a preference estimate which surfaces the content most interesting to both the audience as a whole and a specific reader. There are many proven measures listed above which work well, although the truly breakaway successes are usually those that either have some novel means of content creation, preference estimation, or both.

Conclusions

Most every website falls into one of three categories, each of which is defined in terms of data-user interaction. Content-focused websites are those which regularly generate topical content, such as online newspapers, blogs, digg, or Wikipedia. The most pertinent question for these websites is "What information does it provide?"

For a reader to answer this question the author of a content-focused website needs to provide a window into their content. Presuming the author actually wants the reader to stay around and consume more content these windows need to do more than just show random content, they need to show interesting content It is therefore important for the author to find a way to estimate the preferences of his readers.

This can be accomplished at either the global, aggregate level or the local, contextual level. A global estimate surfaces content which is interesting to the average reader while a local estimate surfaces content interesting to a specific reader, given what you know about them. In addition readers sometimes make their preferences known explicitly in which case there should also be a path for readers who are looking for specific content, e.g., a proper search function. Assuming you are actually writing worthwhile content then a good estimate goes a long way towards converting users to your site.

Above all it is important to think clearly about getting to your readers what they want as easily as possible. I often find it useful to imagine I know nothing about where content resides in my site and go from there. Is what I see interesting enough for me to keep looking? Is so, how long before it becomes uninteresting? If not, how long before it does? Could I find what I wanted if I really had to?

Finally, I'd love to get any and all feedback on this article. I've been tossing these ideas around in my head for a few months now and thought now was a good time to write them down for the first time. Cheers!