Drupal is an open source CMS. It it used by many big-name websites, like The Onion and the Mozilla Foundation's Get Firefox Spread Firefox campaign. However, it suffers from a few serious problems which make it extremely difficult to adapt to large, complex sites. If you're looking to deploy something like a social network using Drupal then this article is definitely worth your read. Even if you're not, these are facts any developer or admin should know about the software they might be using.

My relationship is definitely love-hate and depending on what sort of site you're looking to deploy using drupal this entry might very well be worth a read. I'm going to assume that people reading this are familiar with designing web applications and the typical structure of a CMS. Drupal uses some non-standard terminology and is very jargon-intense. If you're not already familiar with Drupal then I suggest reading IBM's article on Drupal's design.

Design

This section is for developers. It is possible to use Drupal in a way that never requires you, the consumer of Drupal, to write PHP. There will be problems with this, mostly related to performance and maintainability, but as I hope to show, these problems stem largely from the design of Drupal's core. Now, the big issues.

  1. The API is ignorant of context

    Put simply, loading a node (or a user) is an all-or-nothing deal unless you want to write your own SQL. In Drupal each module has the option of hooking into the node (and/or user) API. The most common case is where a node has some additional data it wants to associate with the node. Typically the node will create a hook so that every time a node is loaded is queries the database and inserts some that data into the node. This happens every time one calls node_load. The situation is analogous when one invokes user_load to load a user object.

    Now let's take the buddylist module as an example. This is a third-party module and thus not part of the Drupal core, but the idiom is used throughout and module developers basically have no other option if they're looking for any kind of maintainability. This is the heart of the buddylist_get_buddies function, which returns a formatted list of friends.

    if ($buddies = buddylist_get_buddies($user->uid)) {
      foreach(array_keys($buddies) as $buddy) {
        $account = user_load(array('uid' => $buddy));
        $listbuddies[] = $account;
      }
      return theme('user_list', $listbuddies);
    }

    The problem is that user_load loads all data associated with a user — it is completely ignorant of context. What happens if I have ten modules each of which issues a query when a user is loaded and I have fifty users on my buddy list? One way around this is to have buddylist_get_buddies issue a single query at the top of the page. After all, a "buddylist" is probably just going to consist of a list of usernames. Rather than issuing 100 queries I could simply do something like

    SELECT u.uid, u.name FROM buddylist b JOIN user u ON (u.uid = b.buddy_id) WHERE b.uid = %d
     

    This query is fast for any reasonably-sized buddylist but it totally breaks maintainability. What if someone else wants to use more than a name in the buddylist? The "Drupal way" would be to overwrite the theming function so that when theme('user_list', $listbuddies); is called we get our custom output. The extra data is available because user_load gives us all the data, irrespective of our need for it. But the buddylist module has no way of knowing what data you want up front — only the top-level stuff, the theming functions, really know for sure.

    All this is to say that the points where you know exactly what data you need are precisely those points where you're least able to get it. Oops! To contrast, in an MVC-based framework like Rails, you'd get around this by having a controller fetch the active user and creating a view more-or-less thus:

    <ul id="buddylist">
        <% User.buddies.each do |buddy| %>
        <li><a href="/user/view/<%= buddy.uid %>"><%= buddy.name %></li>
        <%end></ul>
     

    The view knows exactly what it needs to display and the data for a user's buddies isn't retreived from the database until you access User.buddies.

  2. Event-driven? What's that?

    As I noted above Drupal is procedural at heart. Event or signal-driven designs are very common in the procedural world, but, again, it seems like Drupal can't decide what it wants. Rather than create an actual event/listener system Drupal uses a strange "hooks" system.

    The core defines a certain set of so-called "hooks." For example, hook_load is the hook associated with a node getting loaded. If I have a module called foo.module and create within that module a foo_load function then every time node_load is called my foo_load function will also get called. This function will probably fetch foo-specific data from the database and stuff it back into the node. An example:

    function foo_load($node) {
      $additions = db_fetch_object(db_query('SELECT * FROM {mytable} WHERE nid = %s', $node->nid));
      return $additions;
    }

    Now, personally, I find this system a little strange. For one, it forces me to know each and every hook within Drupal when I'm naming a function. If there's a hook called hook_shipoopie and I, for whatever reason, create a foo_shipoopie function — a totally legal function name — then there could be all sorts of unintended consequences. Indeed, this decision seems so strange to me that I can't imagine it wasn't made on purpose. Maybe someone can give me a good reason?

    More annoying, though, is the fact that even though the "hooks" system is the mechanism by which modules interact with the core, there's no way to use it for inter-module communication. The buddylist module, for example, might want to add "befriend" and "defriend" events. As a web application developer I can then choose what happens when a user gets added or removed from someone's buddylist. Since I know that I have the private message module installed I might just want to send someone a private message. Yeah, fine, the buddylist module could define "befriend" and "defriend" hooks, call module_invoke_all, but we're still polluting the global namespace. And what if I need to attach custom data to the event? A real event system would allow for much greater flexibility.

Performance

These design issues encourage bad developer practices and have huge performance implications.

  1. Just One More Query Syndrome

    Drupal suffers from what one might call the "just one more query syndrome." Unfortunately because of the context-blind nature of the Drupal API, a lot of these extra queries are being executed inside big loops. For the CS people out there, consider this. Most modules exist to alter the way nodes or users work, either by adding additional functionality or content. This means that often times you will be executing at least one query per node on a page and one query per user. Let u be the number of users on a page, n the number of nodes, and m the number of modules you have installed. Then we get the following:

    #queries per page = O((u+n)m)

    I can't overstate the importance this has for the scalability of Drupal. Imagine trying to create a social networking site, which Drupal claims it can do. You will have hundreds of thousands of nodes, tens of thousands of users, and will need at least a dozen-or-so modules to get the functionality you want. Here's a scenario: load a listing of the ten most popular blogs on your site with a buddylist, a list of the five popular groups, and a list of new users elsewhere on the site. Let's say 10 modules issue queries for a node (not unusual) and 3 for a user. This makes 15*10 + 3*10 = 180 queries (groups are nodes, oddly enough). I have definitely seen cases where a complex page powered by Drupal would execute on the order of 1000 database queries uncached. Caching is only a symptomatic cure.

    This is just the number of queries to get the display elements on the screen. Drupal stores everything in the database: session data, application variables, URL aliases, everything! If database resources were cookies Drupal would be the Cookie Monster after spending the afternoon getting high with his pals. You could try to cache the expensive parts of the page, but the ability to increase performance via caching does not excuse bad design, it only hides it. And God forbid you get a cache miss. Hello 500 error!

  2. Third-party modules are awful performers

    Some third-party modules will beat your site into a bloody pulp if you're not careful. The two biggest offenders I've seen are og and views. og is a module which provides group functionality and access controls based around those groups. Views is a generic system by which administrators can define rules to display a list of nodes (e.g., show me every node tagged with "foobar" written in the last week). This is all done through an administrative GUI interface.

    og is basically essential if you want group functionality, a mainstay of social networking sites. Drupal has probably the worst hook ever, hook_db_rewrite_sql, which allows any module to rewrite any SQL headed towards the database. og uses this for permissions. Basically for every list of nodes you fetch og adds a WHERE clause to the effect of "AND you have permission to view this node." These where clauses are absolutely horrendous, of course, and can cause otherwise innocuous queries to take over 20ms to run. If you're fetching multiple lists of nodes on a single page it only gets worse.

    From the description of views above you can probably guess that the module programmatically generates SQL from the parameters specified on the administrative page. As anyone familiar with frameworks and CMSs knows, generated SQL is almost universally awful. The views module provides no exception. This SQL is only there to fetch a list of node ids, mind you. Each individual node id is then passed to node_load, which then results is another avalanche of database hits, even if the SQL generated by views is already using some of this data to filter the list of node ids. For example, if I had a view which produced list of all posts authored by "jim" it would generate SQL something like

    SELECT n.nid FROM node n JOIN user u ON (n.uid = u.uid) WHERE u.name = 'jim'

    Later, when we invoke node_load, this very same data will be fetched by the user module using hook_load. Views is too clever by half, in my opinion.

    These two are biggies because they're such popular modules, but as with any system most third-party modules for Drupal simply suck. I can't really blame Drupal for that, except perhaps insofar as I think its core APIs make it more difficult to write good modules.

The End?

I really don't mean to dump on Drupal so much. Well, ok, I do, but it's not out of spite. Part of it is that, as a developer, I find Drupal to be very unfriendly. I know exactly what I want to do and more often than not Drupal is a roadblock. Add to this the performance issues and I think anyone trying to use Drupal to design a medium-to-large sized interactive website should definitely take a second look. Developing such a site with Drupal would take no less time than using a more general rapid development framework like Ruby on Rails, Django, or Catalyst if you have experience developing such web applications already.

If, however, you're relatively inexperienced in designing and implementing complex interactive websites and don't have any huge demographic goals then Drupal is very much worth your time. In fact, I have recommended it to people for just this reason. Most of the modules "just work," even if they have performance implications. Sites like The Onion use very few modules, in which case Drupal's scalability increases marketedly.

Update: Ok, so I've touched a nerve with some people. I don't understand why people get so invested in their frameworks, personally, but I guess it's just a fact of life.

I will say that I was inaccurate on a few accounts. First, node_load does not blindly invoke all the _load hooks, only the load hooks for that content type. There is, however, a hook_nodeapi which allows other modules to insert data. Many modules make use of this, e.g., taxonomy fetches every node's taxonomy when it's loaded. This fact does not change the performance analysis.

Also, views does not indiscriminately invoke node_load. Rather, it only invokes it on four of seven types of views. It does not invoke it for the table or grid view. I'd argue that the teaser/full node lists are the most common type of views, but I didn't want to argue at this level of detail. Rather, my point is that the architecture of Drupal is fundamentally skewed away from scalability, particularly with respect to the number of modules installed. This means that each module isn't adding a constant level of complexity to your site, but has the real potential to cause performance all around to suffer.

37 Comments

  1. Amy Stephen » Four problems with Drupal March 1st, 2007 / 9:01 pm

    […] Read the article. […]

  2. merlinofchaos March 2nd, 2007 / 12:10 am

    From the description of views above you can probably guess that the module programmatically generates SQL from the parameters specified on the administrative page. As anyone familiar with frameworks and CMSs knows, generated SQL is almost universally awful. The views module provides no exception. This SQL is only there to fetch a list of node ids, mind you. Each individual node id is then passed to node_load, which then results is another avalanche of database hits, even if the SQL generated by views is already using some of this data to filter the list of node ids. For example, if I had a view which produced list of all posts authored by “jim” it would generate SQL something like

    Dude, do your research.

    Views only does a node_load if it has to. There are a lot of situations where it has to for various reasons. You’d know that if you actually studied it as much as you claim to have.

  3. merlinofchaos March 2nd, 2007 / 12:12 am

    Actually, you’re wrong about foo_load() as well, though you’re right in general about the hook system. (It’s just that foo_load is a special type of hook that is not invoked universally, it is only invoked when the ‘foo’ content type is loaded).

  4. merlinofchaos March 2nd, 2007 / 1:11 am

    I posted a more complete response here: http://www.angrydonuts.com/a_response_to_4_problems_with_dr

  5. newsmotto! » Problems with Drupal March 2nd, 2007 / 1:44 am

    […] cites 4 problems with Drupal. Byran is keeping a watch on the comments to follow from Drupal […]

  6. Robert Douglass March 2nd, 2007 / 3:09 am

    I’m the maintainer of the buddylist module. You raise some excellent points, and rather than quibble with you on some of the details, I’d rather invite you to join in the discussion and activity going on to create Drupal 6. The current rewrite of the menu system should be proof enough that any fundamental design decision in Drupal can be challenged and improved upon. You obviously have a good eye for application architecture.

    -Robert

  7. Sam Minnee March 2nd, 2007 / 3:31 am

    Hi there,

    My name is Sam Minnee, lead developer of SilverStripe open source CMS. We’ve built our CMS around an ORM / MVC framework much like Rails or Django, which makes it much easier to customise your site without the troubles you’ve run into with Drupal.

    I’d appreciate hearing what you think of it:

    * Demo: http://demo.silverstripe.com
    * Site: http://www.silverstripe.com

  8. greggles March 2nd, 2007 / 9:33 am

    It’s the spreadfirefox site, not the getfirefox.com site, that is in Drupal. I barely even started reading this article and already have to question the accuracy…hmmm

  9. Jesse March 2nd, 2007 / 12:09 pm

    greggles,

    You’re right, my mistake. GetFireFox isn’t even a website. :P

  10. Jesse March 2nd, 2007 / 12:15 pm

    merlinofchaos,

    You’re right, the views module doesn’t necessarily execute node_load. It does, however, for both full and teaser node lists. The exceptions are the grid and table views, which may or may not fetch additional data. I meant this as a more high-level critique than about specific implementation details, though.

    You’re right about the hook_load stuff, too. foo_load is invoked only if a type foo is loaded. There is the nodeapi hook for other modules to insert arbitrary data when another content type is loaded. The user API, however, is much more abusive in this regard.

  11. Akkam’s Razor March 2nd, 2007 / 11:58 pm

    […] 20bits » Blog Archive » 4 Problems with Drupal A short technical criticism of Drupal. (tags: cms problems programming) […]

  12. james dot schumann » links for 2007-03-06 March 6th, 2007 / 9:29 am

    […] 20bits » Blog Archive » 4 Problems with Drupal (tags: drupal) […]

  13. merlinofchaos March 8th, 2007 / 1:32 pm

    Jesse, I agree about the user hooks. They are fairly antiquated; nobody has really been doing any maintenance on the user.module stuff, so the existing code gets carried forward with every version. As near as I can tell, it hasn’t changed significnatly since at least Drupal 4.4 (maybe earlier — hard to say since I wasn’t on the project way back then).

    The node system also used to allow you to modify the node table directly and add fields; this got removed for revisions, and has been on teh slate to get put back, except that nobody has picked up the cause.

    It’s something I want to do, but I can only carry so many causes and that one isn’t quite making the list. Maybe if I get insomniac one night I’ll work on it.

  14. Jesse March 8th, 2007 / 1:39 pm

    The node system also used to allow you to modify the node table directly and add fields; this got removed for revisions, and has been on teh slate to get put back, except that nobody has picked up the cause.

    That seems like a particularly bad idea to me. Altering a table with hundreds of thousands of entries can take a long time and prevents all reading/writing during that period.

    For data in a 1-1 relationship with the nodes we just created an additional table called node_attrributes. This table contains arbitrary key/values pairs which gets loaded on every node_load. Other modules can then insert data into this table and it becomes available as if it were in the node table via $node->attributes.

  15. Chris Lu March 30th, 2007 / 5:16 pm

    Interesting point. I thought Drupal is popular for small websites and spent some time to create a flash demo on how to do scalable full-text search on databases, taking Drupal as an example.

    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

    You can create a full-text database search service, return results as HTML/XML/JSON. It uses the Lucene directly in java, but can be easily used with Ruby, PHP, or any existing database web applicatoins.

    You can easily index, re-index, incremental-index. It’s also highly scalable and easily customizable.

    The best thing is, it’s super easy. You can create a production-level search in 3 minutes, and you don’t need to know Java.

  16. Andy Triboletti July 2nd, 2007 / 12:10 pm

    “I know exactly what I want to do and more often than not Drupal is a roadblock.” After developing with drupal for several months I couldn’t agree with this statement more. Drupal makes me want to vomit.

  17. John H July 10th, 2007 / 1:50 pm

    Excellent read. I’ve been developing with Drupal for over two years, which has been (and continues to be) a love-hate relationship. That said, I’m making a fair amount of income from Drupal-driven projects so it’s become a case of not biting the hand that feeds me.

    Some people get totally geeked out and protective about their favorite solution (CMS, framework, etc.), but for me Drupal is just another tool in my belt.

  18. AnferTuto July 28th, 2007 / 3:58 pm

    Hola faretaste
    mekodinosad

  19. Scalable web architectures » Blog Archive » Talks and slides from various web architects August 3rd, 2007 / 4:45 pm

    […] 4 Problems with Drupal […]

  20. Luigi August 13th, 2007 / 9:42 am

    Hi
    the fact that I have understood almost nothing of your analysis probably means that you are right, in the sense that something is maybe wrong in the basic design of Drupal (or in me), besides scalability, if I cannot understand what’s going on everytime I load a page, or follow the flow of data and function from database to the web page. Given the current documentation, debugging in Drupal is a nightmare, the third part module are poorly designed and developed. Most of them are completely unusable (why do you put them in the list?), including basic CMS module like file management or weblink. What on earth is a CMS without intuitive file and image management? Like most of open source web application, it fails to be usable for anything more than basic, “demo-like” scenarios. Real life applications development requires deep debugging and personalisation, but how can you do it without a complete and clear documentation? Not to mention the problems with version upgrades… I am sorry, I know there’s a lot of personal commitment and work around Drupal, and I don’t think the fault is of the Drupal community. I feel there’s something wrong with the open source process when you have to deal with complex applications. Probably it simply doesn’t work this way. Maybe there is a complexity threshold beyond wich the baazar model isn’t effective without a proper architecture design in the first place. I am not an expert, but a humble database programmer trying to develop a clinical portal with Drupal, and loosing sanity in doing this.

  21. Jesse August 14th, 2007 / 10:20 pm

    Luigi,

    I don’t think that’s true at all. Linux, the poster-child of Open Source, is an order of magnitude more complex than Drupal. As are things like OpenOffice, Mozilla/Firefox, etc. The scope isn’t the cause of the problems.

    If Drupal is causing you headaches use something else. Drupal is there to make your life easier, not harder. Unless you’re somehow committed to using it there’s no reason to if it’s not giving you some benefit.

  22. R.J. Steinert August 23rd, 2007 / 12:56 am

    I like this article. I see it as constructive criticism and brings up a good point that people looking to use Drupal for large scale sites need to be careful with what modules they are using due to heavy sql loads.

  23. Abhijit August 23rd, 2007 / 9:50 am

    Hi there,

    First of all thanks Jesse for a nice read.

    I would like to comment about those who are criticizing “Jesse”….

    “Suppose I am your client, and I want you to download source from Mozilla (FireFox), its open source and also a win 32 application and I demand to develop an Operating System out of it”

    You will surely feel like kicking my ass, right ?

    So Firefox is good & also Drupal … but for sure drupal is now where near a good framework.

    One more important factor “CMS” stands for Content Management System and Bla bla … and if you try and put lot of business logic in it using GOD created API’s then only your system will stand up.

    last but not least “every entity has a divine roles, and Drupals role is limited to content management and partial extension with use of some clumsy API’s ”

    I hope i am not hurting drupal lovers but you have to accept some facts

  24. keizo/weblog » Blog Archive » My Summer of Code August 24th, 2007 / 4:22 am

    […] noticed a whole lot of ugliness in Drupal in the process, some of them mentioned here. Drupal is database query hungry. I have 15 stories listed on my front page, bang, there’s 15 […]

  25. joeph October 26th, 2007 / 4:16 pm

    Thanks for the post Jesse.
    It would be nice if you could come up with an alternative.
    Saying that Drupal is not good in your opinion raises the question what is a good alternative?

    Thanks, Joep

  26. Scalable web architectures at Лучший WEB разработчик Украины November 5th, 2007 / 9:43 am

    […] 4 Problems with Drupal […]

  27. Andrey November 18th, 2007 / 5:51 pm

    Good writeup. One key rule violated: if you use an Open Source product and you dislike something - fix it!!!

    Come on, man, you clearly have the required strength of opinion and the expertise. Otherwise it’s just bitching :-(

  28. Jesse November 18th, 2007 / 11:12 pm

    Andrey,

    I only used Drupal because the company I (formerly) worked for used it. I’d fix it if I were paid to fix it, but I honestly don’t care all that much.

  29. Scaleable web architectures December 28th, 2007 / 2:23 pm

    […] 4 Problems with Drupal […]

  30. Drupal: The Next King of CMS? January 3rd, 2008 / 4:44 pm

    […] It appears there are some very interesting complaints about scalability with Drupal which apparently still hold - ie, great for a small site, but if you want it to run something […]

  31. John H January 4th, 2008 / 11:32 am

    Drupal is a viable option for some projects if you’re willing/able to develop custom modules tailored specifically for the application and use third party contributed modules sparingly, if at all.

  32. Del.icio.us op 29 februari 2008 — Michel Vuijlsteke's Weblog February 29th, 2008 / 5:17 pm

    […] - 4 Problems with Drupal | 20bits (tags: drupal […]

  33. belts March 28th, 2008 / 10:14 am

    I was considering Durpal as a CMS for one of my next sites. It’s going to be a relitivly small e-commerce site. SEO will be an important factor and a payment getway. I can’t see the site ever getting above 10 pages. I’m not much of a develpoer as yet so the easiest solution would be best for me. Would you recommend Drupal or is there another free CMS you’d recommend? Thanks in advance.

    Jim
    (couldn’t resist with the ‘belts’ bit)

  34. Jesse March 28th, 2008 / 10:16 am

    belts,

    If your site is going to be simple (read: only a few modules) and you don’t have the programming skills, Drupal is fine.

  35. belts March 31st, 2008 / 4:03 am

    ok cool, I’ll take a look at that.

    Thanks for the tip Jesse.

  36. Secret Owl April 2nd, 2008 / 11:39 am

    Well, crap. Too late for me, I’ve just spent the last month creating my site with drupal. I decided to go minimalist before I read this, I’m just hoping it makes some sort of difference.

    And, yeah, this is my absolute first foray into php and mysql and all of that jazz. I don’t think I did that badly, for an 18 year old novice!

    Anyway, thanks for the article, mate; it cleared a few things right up for me.

  37. VoiceHero May 4th, 2008 / 9:17 am

    interesting read. fortunately all solutions evolve and so does drupal. to list the pros and cons of every open source cms is probably not the point once you decided to focus your efforts on some of them, since you are serving clients using them. so in that case i would simply contribute to fix the issues raised above or simply agree with the statement that it is some resultless criticism - similar statements can be made any other solution - whatever the angle is you want to look at it from.

Leave a Reply