4 Problems with Drupal
by Jesse Farmer on February 27, 2007

Drupal is an open source CMS. It it used by many big-name websites, like The Onion and the Mozilla Foundation's Get Firefox Spread Firefox campaign. However, it suffers from a few serious problems which make it extremely difficult to adapt to large, complex sites. If you're looking to deploy something like a social network using Drupal then this article is definitely worth your read. Even if you're not, these are facts any developer or admin should know about the software they might be using.

My relationship is definitely love-hate and depending on what sort of site you're looking to deploy using drupal this entry might very well be worth a read. I'm going to assume that people reading this are familiar with designing web applications and the typical structure of a CMS. Drupal uses some non-standard terminology and is very jargon-intense. If you're not already familiar with Drupal then I suggest reading IBM's article on Drupal's design.

Design

This section is for developers. It is possible to use Drupal in a way that never requires you, the consumer of Drupal, to write PHP. There will be problems with this, mostly related to performance and maintainability, but as I hope to show, these problems stem largely from the design of Drupal's core. Now, the big issues.

  1. The API is ignorant of context

    Put simply, loading a node (or a user) is an all-or-nothing deal unless you want to write your own SQL. In Drupal each module has the option of hooking into the node (and/or user) API. The most common case is where a node has some additional data it wants to associate with the node. Typically the node will create a hook so that every time a node is loaded is queries the database and inserts some that data into the node. This happens every time one calls node_load. The situation is analogous when one invokes user_load to load a user object.

    Now let's take the buddylist module as an example. This is a third-party module and thus not part of the Drupal core, but the idiom is used throughout and module developers basically have no other option if they're looking for any kind of maintainability. This is the heart of the buddylist_get_buddies function, which returns a formatted list of friends.

    if ($buddies = buddylist_get_buddies($user->uid)) {
      foreach(array_keys($buddies) as $buddy) {
        $account = user_load(array('uid' => $buddy));
        $listbuddies[] = $account;
      }
      return theme('user_list', $listbuddies);
    }
    The problem is that user_load loads all data associated with a user — it is completely ignorant of context. What happens if I have ten modules each of which issues a query when a user is loaded and I have fifty users on my buddy list? One way around this is to have buddylist_get_buddies issue a single query at the top of the page. After all, a "buddylist" is probably just going to consist of a list of usernames. Rather than issuing 100 queries I could simply do something like

    SELECT u.uid, u.name FROM buddylist b JOIN user u ON (u.uid = b.buddy_id) WHERE b.uid = %d
     
    This query is fast for any reasonably-sized buddylist but it totally breaks maintainability. What if someone else wants to use more than a name in the buddylist? The "Drupal way" would be to overwrite the theming function so that when theme('user_list', $listbuddies); is called we get our custom output. The extra data is available because user_load gives us all the data, irrespective of our need for it. But the buddylist module has no way of knowing what data you want up front — only the top-level stuff, the theming functions, really know for sure.

    All this is to say that the points where you know exactly what data you need are precisely those points where you're least able to get it. Oops! To contrast, in an MVC-based framework like Rails, you'd get around this by having a controller fetch the active user and creating a view more-or-less thus:

    <ul id="buddylist">
        <% User.buddies.each do |buddy| %>
        <li><a href="/user/view/<%= buddy.uid %>"><%= buddy.name %></li>
        <%end></ul>
     
    The view knows exactly what it needs to display and the data for a user's buddies isn't retreived from the database until you access User.buddies.

  2. Event-driven? What's that?

    As I noted above Drupal is procedural at heart. Event or signal-driven designs are very common in the procedural world, but, again, it seems like Drupal can't decide what it wants. Rather than create an actual event/listener system Drupal uses a strange "hooks" system.

    The core defines a certain set of so-called "hooks." For example, hook_load is the hook associated with a node getting loaded. If I have a module called foo.module and create within that module a foo_load function then every time node_load is called my foo_load function will also get called. This function will probably fetch foo-specific data from the database and stuff it back into the node. An example:

    function foo_load($node) {
      $additions = db_fetch_object(db_query('SELECT * FROM {mytable} WHERE nid = %s', $node->nid));
      return $additions;
    }

    Now, personally, I find this system a little strange. For one, it forces me to know each and every hook within Drupal when I'm naming a function. If there's a hook called hook_shipoopie and I, for whatever reason, create a foo_shipoopie function — a totally legal function name — then there could be all sorts of unintended consequences. Indeed, this decision seems so strange to me that I can't imagine it wasn't made on purpose. Maybe someone can give me a good reason?

    More annoying, though, is the fact that even though the "hooks" system is the mechanism by which modules interact with the core, there's no way to use it for inter-module communication. The buddylist module, for example, might want to add "befriend" and "defriend" events. As a web application developer I can then choose what happens when a user gets added or removed from someone's buddylist. Since I know that I have the private message module installed I might just want to send someone a private message. Yeah, fine, the buddylist module could define "befriend" and "defriend" hooks, call module_invoke_all, but we're still polluting the global namespace. And what if I need to attach custom data to the event? A real event system would allow for much greater flexibility.

Performance

These design issues encourage bad developer practices and have huge performance implications.

  1. Just One More Query Syndrome

    Drupal suffers from what one might call the "just one more query syndrome." Unfortunately because of the context-blind nature of the Drupal API, a lot of these extra queries are being executed inside big loops. For the CS people out there, consider this. Most modules exist to alter the way nodes or users work, either by adding additional functionality or content. This means that often times you will be executing at least one query per node on a page and one query per user. Let u be the number of users on a page, n the number of nodes, and m the number of modules you have installed. Then we get the following:

    #queries per page = O((u+n)m)

    I can't overstate the importance this has for the scalability of Drupal. Imagine trying to create a social networking site, which Drupal claims it can do. You will have hundreds of thousands of nodes, tens of thousands of users, and will need at least a dozen-or-so modules to get the functionality you want. Here's a scenario: load a listing of the ten most popular blogs on your site with a buddylist, a list of the five popular groups, and a list of new users elsewhere on the site. Let's say 10 modules issue queries for a node (not unusual) and 3 for a user. This makes 15*10 + 3*10 = 180 queries (groups are nodes, oddly enough). I have definitely seen cases where a complex page powered by Drupal would execute on the order of 1000 database queries uncached. Caching is only a symptomatic cure.

    This is just the number of queries to get the display elements on the screen. Drupal stores everything in the database: session data, application variables, URL aliases, everything! If database resources were cookies Drupal would be the Cookie Monster after spending the afternoon getting high with his pals. You could try to cache the expensive parts of the page, but the ability to increase performance via caching does not excuse bad design, it only hides it. And God forbid you get a cache miss. Hello 500 error!

  2. Third-party modules are awful performers

    Some third-party modules will beat your site into a bloody pulp if you're not careful. The two biggest offenders I've seen are og and views. og is a module which provides group functionality and access controls based around those groups. Views is a generic system by which administrators can define rules to display a list of nodes (e.g., show me every node tagged with "foobar" written in the last week). This is all done through an administrative GUI interface.

    og is basically essential if you want group functionality, a mainstay of social networking sites. Drupal has probably the worst hook ever, hook_db_rewrite_sql, which allows any module to rewrite any SQL headed towards the database. og uses this for permissions. Basically for every list of nodes you fetch og adds a WHERE clause to the effect of "AND you have permission to view this node." These where clauses are absolutely horrendous, of course, and can cause otherwise innocuous queries to take over 20ms to run. If you're fetching multiple lists of nodes on a single page it only gets worse.

    From the description of views above you can probably guess that the module programmatically generates SQL from the parameters specified on the administrative page. As anyone familiar with frameworks and CMSs knows, generated SQL is almost universally awful. The views module provides no exception. This SQL is only there to fetch a list of node ids, mind you. Each individual node id is then passed to node_load, which then results is another avalanche of database hits, even if the SQL generated by views is already using some of this data to filter the list of node ids. For example, if I had a view which produced list of all posts authored by "jim" it would generate SQL something like

    SELECT n.nid FROM node n JOIN user u ON (n.uid = u.uid) WHERE u.name = 'jim'
    Later, when we invoke node_load, this very same data will be fetched by the user module using hook_load. Views is too clever by half, in my opinion.

    These two are biggies because they're such popular modules, but as with any system most third-party modules for Drupal simply suck. I can't really blame Drupal for that, except perhaps insofar as I think its core APIs make it more difficult to write good modules.

The End?

I really don't mean to dump on Drupal so much. Well, ok, I do, but it's not out of spite. Part of it is that, as a developer, I find Drupal to be very unfriendly. I know exactly what I want to do and more often than not Drupal is a roadblock. Add to this the performance issues and I think anyone trying to use Drupal to design a medium-to-large sized interactive website should definitely take a second look. Developing such a site with Drupal would take no less time than using a more general rapid development framework like Ruby on Rails, Django, or Catalyst if you have experience developing such web applications already.

If, however, you're relatively inexperienced in designing and implementing complex interactive websites and don't have any huge demographic goals then Drupal is very much worth your time. In fact, I have recommended it to people for just this reason. Most of the modules "just work," even if they have performance implications. Sites like The Onion use very few modules, in which case Drupal's scalability increases marketedly.

Update: Ok, so I've touched a nerve with some people. I don't understand why people get so invested in their frameworks, personally, but I guess it's just a fact of life. I will say that I was inaccurate on a few accounts. First, node_load does not blindly invoke all the _load hooks, only the load hooks for that content type. There is, however, a hook_nodeapi which allows other modules to insert data. Many modules make use of this, e.g., taxonomy fetches every node's taxonomy when it's loaded. This fact does not change the performance analysis. Also, views does not indiscriminately invoke node_load. Rather, it only invokes it on four of seven types of views. It does not invoke it for the table or grid view. I'd argue that the teaser/full node lists are the most common type of views, but I didn't want to argue at this level of detail. Rather, my point is that the architecture of Drupal is fundamentally skewed away from scalability, particularly with respect to the number of modules installed. This means that each module isn't adding a constant level of complexity to your site, but has the real potential to cause performance all around to suffer.

Viewing 41 Comments

    • ^
    • v

    From the description of views above you can probably guess that the module programmatically generates SQL from the parameters specified on the administrative page. As anyone familiar with frameworks and CMSs knows, generated SQL is almost universally awful. The views module provides no exception. This SQL is only there to fetch a list of node ids, mind you. Each individual node id is then passed to node_load, which then results is another avalanche of database hits, even if the SQL generated by views is already using some of this data to filter the list of node ids. For example, if I had a view which produced list of all posts authored by “jim” it would generate SQL something like


    Dude, do your research.

    Views only does a node_load if it has to. There are a lot of situations where it has to for various reasons. You'd know that if you actually studied it as much as you claim to have.
    • ^
    • v
    Actually, you're wrong about foo_load() as well, though you're right in general about the hook system. (It's just that foo_load is a special type of hook that is not invoked universally, it is only invoked when the 'foo' content type is loaded).
    • ^
    • v
    I posted a more complete response here: http://www.angrydonuts.com/a_response_to_4_prob...
    • ^
    • v
    I'm the maintainer of the buddylist module. You raise some excellent points, and rather than quibble with you on some of the details, I'd rather invite you to join in the discussion and activity going on to create Drupal 6. The current rewrite of the menu system should be proof enough that any fundamental design decision in Drupal can be challenged and improved upon. You obviously have a good eye for application architecture.

    -Robert
    • ^
    • v
    Hi there,

    My name is Sam Minnee, lead developer of SilverStripe open source CMS. We've built our CMS around an ORM / MVC framework much like Rails or Django, which makes it much easier to customise your site without the troubles you've run into with Drupal.

    I'd appreciate hearing what you think of it:

    * Demo: http://demo.silverstripe.com
    * Site: http://www.silverstripe.com
    • ^
    • v
    It's the spreadfirefox site, not the getfirefox.com site, that is in Drupal. I barely even started reading this article and already have to question the accuracy...hmmm
    • ^
    • v
    greggles,

    You're right, my mistake. GetFireFox isn't even a website. :P
    • ^
    • v
    merlinofchaos,

    You're right, the views module doesn't necessarily execute node_load. It does, however, for both full and teaser node lists. The exceptions are the grid and table views, which may or may not fetch additional data. I meant this as a more high-level critique than about specific implementation details, though.

    You're right about the hook_load stuff, too. foo_load is invoked only if a type foo is loaded. There is the nodeapi hook for other modules to insert arbitrary data when another content type is loaded. The user API, however, is much more abusive in this regard.
    • ^
    • v
    Jesse, I agree about the user hooks. They are fairly antiquated; nobody has really been doing any maintenance on the user.module stuff, so the existing code gets carried forward with every version. As near as I can tell, it hasn't changed significnatly since at least Drupal 4.4 (maybe earlier -- hard to say since I wasn't on the project way back then).

    The node system also used to allow you to modify the node table directly and add fields; this got removed for revisions, and has been on teh slate to get put back, except that nobody has picked up the cause.

    It's something I want to do, but I can only carry so many causes and that one isn't quite making the list. Maybe if I get insomniac one night I'll work on it.
    • ^
    • v
    The node system also used to allow you to modify the node table directly and add fields; this got removed for revisions, and has been on teh slate to get put back, except that nobody has picked up the cause.


    That seems like a particularly bad idea to me. Altering a table with hundreds of thousands of entries can take a long time and prevents all reading/writing during that period.

    For data in a 1-1 relationship with the nodes we just created an additional table called node_attrributes. This table contains arbitrary key/values pairs which gets loaded on every node_load. Other modules can then insert data into this table and it becomes available as if it were in the node table via $node->attributes.
    • ^
    • v
    Interesting point. I thought Drupal is popular for small websites and spent some time to create a flash demo on how to do scalable full-text search on databases, taking Drupal as an example.

    http://wiki.dbsight.com/index.php?title=Create_...

    You can create a full-text database search service, return results as HTML/XML/JSON. It uses the Lucene directly in java, but can be easily used with Ruby, PHP, or any existing database web applicatoins.

    You can easily index, re-index, incremental-index. It's also highly scalable and easily customizable.

    The best thing is, it's super easy. You can create a production-level search in 3 minutes, and you don't need to know Java.
    • ^
    • v
    "I know exactly what I want to do and more often than not Drupal is a roadblock." After developing with drupal for several months I couldn't agree with this statement more. Drupal makes me want to vomit.
    • ^
    • v
    Excellent read. I've been developing with Drupal for over two years, which has been (and continues to be) a love-hate relationship. That said, I'm making a fair amount of income from Drupal-driven projects so it's become a case of not biting the hand that feeds me.

    Some people get totally geeked out and protective about their favorite solution (CMS, framework, etc.), but for me Drupal is just another tool in my belt.
    • ^
    • v
    Hola faretaste
    mekodinosad
    • ^
    • v
    Hi
    the fact that I have understood almost nothing of your analysis probably means that you are right, in the sense that something is maybe wrong in the basic design of Drupal (or in me), besides scalability, if I cannot understand what's going on everytime I load a page, or follow the flow of data and function from database to the web page. Given the current documentation, debugging in Drupal is a nightmare, the third part module are poorly designed and developed. Most of them are completely unusable (why do you put them in the list?), including basic CMS module like file management or weblink. What on earth is a CMS without intuitive file and image management? Like most of open source web application, it fails to be usable for anything more than basic, "demo-like" scenarios. Real life applications development requires deep debugging and personalisation, but how can you do it without a complete and clear documentation? Not to mention the problems with version upgrades... I am sorry, I know there's a lot of personal commitment and work around Drupal, and I don't think the fault is of the Drupal community. I feel there's something wrong with the open source process when you have to deal with complex applications. Probably it simply doesn't work this way. Maybe there is a complexity threshold beyond wich the baazar model isn't effective without a proper architecture design in the first place. I am not an expert, but a humble database programmer trying to develop a clinical portal with Drupal, and loosing sanity in doing this.
    • ^
    • v
    Luigi,

    I don't think that's true at all. Linux, the poster-child of Open Source, is an order of magnitude more complex than Drupal. As are things like OpenOffice, Mozilla/Firefox, etc. The scope isn't the cause of the problems.

    If Drupal is causing you headaches use something else. Drupal is there to make your life easier, not harder. Unless you're somehow committed to using it there's no reason to if it's not giving you some benefit.
    • ^
    • v
    I like this article. I see it as constructive criticism and brings up a good point that people looking to use Drupal for large scale sites need to be careful with what modules they are using due to heavy sql loads.
    • ^
    • v
    Hi there,

    First of all thanks Jesse for a nice read.

    I would like to comment about those who are criticizing "Jesse"....

    "Suppose I am your client, and I want you to download source from Mozilla (FireFox), its open source and also a win 32 application and I demand to develop an Operating System out of it"

    You will surely feel like kicking my ass, right ?

    So Firefox is good & also Drupal ... but for sure drupal is now where near a good framework.

    One more important factor "CMS" stands for Content Management System and Bla bla ... and if you try and put lot of business logic in it using GOD created API's then only your system will stand up.

    last but not least "every entity has a divine roles, and Drupals role is limited to content management and partial extension with use of some clumsy API's "

    I hope i am not hurting drupal lovers but you have to accept some facts
    • ^
    • v
    Thanks for the post Jesse.
    It would be nice if you could come up with an alternative.
    Saying that Drupal is not good in your opinion raises the question what is a good alternative?

    Thanks, Joep
    • ^
    • v
    Good writeup. One key rule violated: if you use an Open Source product and you dislike something - fix it!!!

    Come on, man, you clearly have the required strength of opinion and the expertise. Otherwise it's just bitching :-(
    • ^
    • v