Drupal is an open source CMS. It it used by many big-name websites, like The Onion and the Mozilla Foundation's Get Firefox Spread Firefox campaign. However, it suffers from a few serious problems which make it extremely difficult to adapt to large, complex sites. If you're looking to deploy something like a social network using Drupal then this article is definitely worth your read. Even if you're not, these are facts any developer or admin should know about the software they might be using.
My relationship is definitely love-hate and depending on what sort of site you're looking to deploy using drupal this entry might very well be worth a read. I'm going to assume that people reading this are familiar with designing web applications and the typical structure of a CMS. Drupal uses some non-standard terminology and is very jargon-intense. If you're not already familiar with Drupal then I suggest reading IBM's article on Drupal's design.
Design
This section is for developers. It is possible to use Drupal in a way that never requires you, the consumer of Drupal, to write PHP. There will be problems with this, mostly related to performance and maintainability, but as I hope to show, these problems stem largely from the design of Drupal's core. Now, the big issues.
-
The API is ignorant of context
Put simply, loading a node (or a user) is an all-or-nothing deal unless you want to write your own SQL. In Drupal each module has the option of hooking into the node (and/or user) API. The most common case is where a node has some additional data it wants to associate with the node. Typically the node will create a hook so that every time a node is loaded is queries the database and inserts some that data into the node. This happens every time one calls node_load. The situation is analogous when one invokes user_load to load a user object.
Now let's take the buddylist module as an example. This is a third-party module and thus not part of the Drupal core, but the idiom is used throughout and module developers basically have no other option if they're looking for any kind of maintainability. This is the heart of the buddylist_get_buddies function, which returns a formatted list of friends.
if ($buddies = buddylist_get_buddies($user->uid)) {The problem is that user_load loads all data associated with a user — it is completely ignorant of context. What happens if I have ten modules each of which issues a query when a user is loaded and I have fifty users on my buddy list? One way around this is to have buddylist_get_buddies issue a single query at the top of the page. After all, a "buddylist" is probably just going to consist of a list of usernames. Rather than issuing 100 queries I could simply do something like
foreach(array_keys($buddies) as $buddy) {
$account = user_load(array('uid' => $buddy));
$listbuddies[] = $account;
}
return theme('user_list', $listbuddies);
}This query is fast for any reasonably-sized buddylist but it totally breaks maintainability. What if someone else wants to use more than a name in the buddylist? The "Drupal way" would be to overwrite the theming function so that when theme('user_list', $listbuddies); is called we get our custom output. The extra data is available because user_load gives us all the data, irrespective of our need for it. But the buddylist module has no way of knowing what data you want up front — only the top-level stuff, the theming functions, really know for sure.
SELECT u.uid, u.name FROM buddylist b JOIN user u ON (u.uid = b.buddy_id) WHERE b.uid = %d
All this is to say that the points where you know exactly what data you need are precisely those points where you're least able to get it. Oops! To contrast, in an MVC-based framework like Rails, you'd get around this by having a controller fetch the active user and creating a view more-or-less thus:
<ul id="buddylist">The view knows exactly what it needs to display and the data for a user's buddies isn't retreived from the database until you access User.buddies.
<% User.buddies.each do |buddy| %>
<li><a href="/user/view/<%= buddy.uid %>"><%= buddy.name %></li>
<%end></ul>
-
Event-driven? What's that?
As I noted above Drupal is procedural at heart. Event or signal-driven designs are very common in the procedural world, but, again, it seems like Drupal can't decide what it wants. Rather than create an actual event/listener system Drupal uses a strange "hooks" system.
The core defines a certain set of so-called "hooks." For example, hook_load is the hook associated with a node getting loaded. If I have a module called foo.module and create within that module a foo_load function then every time node_load is called my foo_load function will also get called. This function will probably fetch foo-specific data from the database and stuff it back into the node. An example:
function foo_load($node) {
$additions = db_fetch_object(db_query('SELECT * FROM {mytable} WHERE nid = %s', $node->nid));
return $additions;
}Now, personally, I find this system a little strange. For one, it forces me to know each and every hook within Drupal when I'm naming a function. If there's a hook called hook_shipoopie and I, for whatever reason, create a foo_shipoopie function — a totally legal function name — then there could be all sorts of unintended consequences. Indeed, this decision seems so strange to me that I can't imagine it wasn't made on purpose. Maybe someone can give me a good reason?
More annoying, though, is the fact that even though the "hooks" system is the mechanism by which modules interact with the core, there's no way to use it for inter-module communication. The buddylist module, for example, might want to add "befriend" and "defriend" events. As a web application developer I can then choose what happens when a user gets added or removed from someone's buddylist. Since I know that I have the private message module installed I might just want to send someone a private message. Yeah, fine, the buddylist module could define "befriend" and "defriend" hooks, call module_invoke_all, but we're still polluting the global namespace. And what if I need to attach custom data to the event? A real event system would allow for much greater flexibility.
Performance
These design issues encourage bad developer practices and have huge performance implications.
-
Just One More Query Syndrome
Drupal suffers from what one might call the "just one more query syndrome." Unfortunately because of the context-blind nature of the Drupal API, a lot of these extra queries are being executed inside big loops. For the CS people out there, consider this. Most modules exist to alter the way nodes or users work, either by adding additional functionality or content. This means that often times you will be executing at least one query per node on a page and one query per user. Let u be the number of users on a page, n the number of nodes, and m the number of modules you have installed. Then we get the following:
#queries per page = O((u+n)m)
I can't overstate the importance this has for the scalability of Drupal. Imagine trying to create a social networking site, which Drupal claims it can do. You will have hundreds of thousands of nodes, tens of thousands of users, and will need at least a dozen-or-so modules to get the functionality you want. Here's a scenario: load a listing of the ten most popular blogs on your site with a buddylist, a list of the five popular groups, and a list of new users elsewhere on the site. Let's say 10 modules issue queries for a node (not unusual) and 3 for a user. This makes 15*10 + 3*10 = 180 queries (groups are nodes, oddly enough). I have definitely seen cases where a complex page powered by Drupal would execute on the order of 1000 database queries uncached. Caching is only a symptomatic cure.
This is just the number of queries to get the display elements on the screen. Drupal stores everything in the database: session data, application variables, URL aliases, everything! If database resources were cookies Drupal would be the Cookie Monster after spending the afternoon getting high with his pals. You could try to cache the expensive parts of the page, but the ability to increase performance via caching does not excuse bad design, it only hides it. And God forbid you get a cache miss. Hello 500 error!
-
Third-party modules are awful performers
Some third-party modules will beat your site into a bloody pulp if you're not careful. The two biggest offenders I've seen are og and views. og is a module which provides group functionality and access controls based around those groups. Views is a generic system by which administrators can define rules to display a list of nodes (e.g., show me every node tagged with "foobar" written in the last week). This is all done through an administrative GUI interface.
og is basically essential if you want group functionality, a mainstay of social networking sites. Drupal has probably the worst hook ever, hook_db_rewrite_sql, which allows any module to rewrite any SQL headed towards the database. og uses this for permissions. Basically for every list of nodes you fetch og adds a WHERE clause to the effect of "AND you have permission to view this node." These where clauses are absolutely horrendous, of course, and can cause otherwise innocuous queries to take over 20ms to run. If you're fetching multiple lists of nodes on a single page it only gets worse.
From the description of views above you can probably guess that the module programmatically generates SQL from the parameters specified on the administrative page. As anyone familiar with frameworks and CMSs knows, generated SQL is almost universally awful. The views module provides no exception. This SQL is only there to fetch a list of node ids, mind you. Each individual node id is then passed to node_load, which then results is another avalanche of database hits, even if the SQL generated by views is already using some of this data to filter the list of node ids. For example, if I had a view which produced list of all posts authored by "jim" it would generate SQL something like
SELECT n.nid FROM node n JOIN user u ON (n.uid = u.uid) WHERE u.name = 'jim'Later, when we invoke node_load, this very same data will be fetched by the user module using hook_load. Views is too clever by half, in my opinion.These two are biggies because they're such popular modules, but as with any system most third-party modules for Drupal simply suck. I can't really blame Drupal for that, except perhaps insofar as I think its core APIs make it more difficult to write good modules.
The End?
I really don't mean to dump on Drupal so much. Well, ok, I do, but it's not out of spite. Part of it is that, as a developer, I find Drupal to be very unfriendly. I know exactly what I want to do and more often than not Drupal is a roadblock. Add to this the performance issues and I think anyone trying to use Drupal to design a medium-to-large sized interactive website should definitely take a second look. Developing such a site with Drupal would take no less time than using a more general rapid development framework like Ruby on Rails, Django, or Catalyst if you have experience developing such web applications already.
If, however, you're relatively inexperienced in designing and implementing complex interactive websites and don't have any huge demographic goals then Drupal is very much worth your time. In fact, I have recommended it to people for just this reason. Most of the modules "just work," even if they have performance implications. Sites like The Onion use very few modules, in which case Drupal's scalability increases marketedly.
Update: Ok, so I've touched a nerve with some people. I don't understand why people get so invested in their frameworks, personally, but I guess it's just a fact of life. I will say that I was inaccurate on a few accounts. First, node_load does not blindly invoke all the _load hooks, only the load hooks for that content type. There is, however, a hook_nodeapi which allows other modules to insert data. Many modules make use of this, e.g., taxonomy fetches every node's taxonomy when it's loaded. This fact does not change the performance analysis. Also, views does not indiscriminately invoke node_load. Rather, it only invokes it on four of seven types of views. It does not invoke it for the table or grid view. I'd argue that the teaser/full node lists are the most common type of views, but I didn't want to argue at this level of detail. Rather, my point is that the architecture of Drupal is fundamentally skewed away from scalability, particularly with respect to the number of modules installed. This means that each module isn't adding a constant level of complexity to your site, but has the real potential to cause performance all around to suffer.
Add New Comment
Viewing 41 Comments
Thanks. Your comment is awaiting approval by a moderator.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.
Do you already have an account? Log in and claim this comment.