Drupal.org Project Retention, Download to Usage

A couple months ago, download statistics were added to projects on Drupal.org. Previously, we could only see how many sites reported using each project, so download numbers provide an interesting new data point. I was particularly interested in how the two numbers compare, so I did some research, which I'll share here. Specifically, I gathered both the usage and the download numbers for every project on Drupal.org, and looked at some general statistics.

My initial reaction when looking at the download numbers on my own projects was something "oh great, those are big numbers." But after thinking about it a bit more, I realized that's actually an indication something isn't great. If the download number is significantly bigger than the usage number, that means a lot of people are downloading the project without actually using it. Some of that can be explained by sites that don't report usage back to Drupal.org, but a substantial portion of it seems likely to be users who decided not to use a project after downloading it. I can imagine a few reasons for this, including a problematic install process, an inaccurate project description, and a project that just doesn't work well.

I would hope most of this would be reported in issue queues, but after looking at the numbers, I doubt that's happening. The following chart shows what I'm calling the "retention rate," the percent of Drupal.org downloads that turn into reported usage, for every project on Drupal.org:

I was expecting this chart to look more like a bell curve, with a common retention rate somewhere in the middle, maybe at around 50%, and fewer projects both above and below this rate. Instead, the most common retention rate is zero, and fewer and fewer projects have higher rates. Very few projects are above 25% retention.

"Retention rate" is a somewhat misleading label for this. Because we're comparing actual downloads to reported usage, there's a gap of unreported usage that explains some of this. But because some projects have near 100% retention, it seems safe to discount that. Some even have over 100% retention, as it's possible to download a project once and use it on multiple sites.

I thought the distribution of retention rates might change based on popularity of projects, so I looked at this same chart on different subsets of more popular projects. If we limit the data to just the top 1000 projects on Drupal.org (by usage numbers), we start to see something approaching a bell curve, with a peak at around 20% retention. That's still very low, and it stays about the same for the top 500 and top 100 projects. But the difference between the top 1000 and all projects does suggest retention during initial install is an important factor in Drupal.org project success.

So what can we learn from this? My initial reaction is that these numbers suggest a lot of room for improvement in the initial experience using new projects in Drupal. If these charts are truly an indication that the vast majority of projects downloaded from Drupal.org are never used, we could make a huge improvement in Drupal adoption by looking closely at the site builder experience immediately after downloading projects. Are people having trouble figuring out how to enable modules and themes? Are dependencies a stumbling block? Are project descriptions just not clear enough?

Ideally everyone who downloads a project on Drupal.org would successfully install it, find it solves the problem they were hoping it would solve, and we'd see usage numbers at close to 100% of download numbers. We're currently far from that, which was a big surprise for me. I'd be happy to hear alternate explanations for this data and have shared the raw data to that end. It would also be great to hear more from what is apparently a large number of people who are downloading Drupal.org projects and not using them.

18 Comments

Modules that get many updates will also see higher download rates. It probably relates directly to the age of the project, although I didn't test that theory. Downloads per version would probably make a better indicator.

It would be interesting to see if the higher number of downloads is due to module updates. e.g. I have downloaded module 1.3 but then just updated to 1.4. This will count as two downloads, but only on reported in use. IMHO these numbers are skewed entirely and should not be scrutinized in the way you are suggesting.

do you have the data for each version ? (D6,7,8..)

The last 2 points make sense. Another one is that (open source wise), many of us just download modules to try them out.

Hey! Great you got statistics across the board... My thought would be about versions - as every project has versions and sometimes it can reach dozens of versions... and all these versions are being downloaded to update modules on the sites - so the downloads counter would go higher while keeping the usage at the same level.

Another factor possibly contributing to 'low retention' is downloads to use as examples. I just took a look at my downloads folder - over the last seven days I have downloaded five modules without the slightest intent of enabling them, just to examine the code. (Of course, there is online repository viewer, but searching (and copying if I find something useful) is way more convenient on localhost.)

Examples for developers (examples) has 'retention rate' of 3.3%, and none of modules with 'examples' in their names has over 10%. Makes me wonder which modules are likely to be used as examples even if not so intended.

What about downloading modules for dev and staging environments, ie. Drush Make and similar tools. Many people will spin up a new Drupal install just like that, and must count for multiple non-unique downloads to that end.

It's great to see so much discussion on this.

Downloads on version updates seems like a good explanation of some of the gap. Those numbers aren't split out at all in any of the public data I've seen, but just looking at the number of releases and/or the date the project was created should give a pretty good rough estimate of how much impact that has. I'll look at adding that data to the stats.

I would think most site builders are downloading projects once for dev and then moving them directly to staging and production environments rather than re-downloading them. So that would actually inflate the usage numbers rather than the download numbers. But I might be wrong about how most people do that.

I bet that having good documentation on the project page is a help to increase the retention rate by making sure that users get what is advertised. I'm not sure how to measure that directly, but something like: * number of words in the description (too little is bad, too many is also kind of bad) * the existence of an image in the project page * some measure of readability score for the project page content * the use of a "read documentation" link * the use of one or more list and heading (e.g. H3) tags on the project page (since html makes it easier to read stuff and indicates people put work into the document)

As others mentioned, I think these statistics are forgetting two major things:

  • Every time a new version is released, people are downloading that version again, same for dev snapshots.
  • drush make a is quite a popular tool these days and results in lots of downloads of the same project and same version again and again, every time something is changed, every time a CI server builds a project, the project is downloaded again (there are built in caches now, but still..).

I think this could be an interesting data if we could have the number of downloads + number of sites using it week by week. But is this data available?

However, it is hard to get real information about that numbers.

As already said, I always download a module once, but put it on dev/staging/prod environment. That would increase the retention rate.

In the opposite, I sometimes download several modules that provide the same functionnality, and just keep the best suited for my project. That would decrease the retention rate.

It's a great addition, but as with any stats, it makes us ask more questions!

For me, having the stats on the release page would be a great addition - as this would show more clearly when updates have skewed the figures.

That said, we obviously have the usage data for each release too (shown on the usage page), would be great if this was too integrated in the release page.

Specifically, I gathered both the usage and the download numbers for every project on Drupal.org, and looked at some general statistics.

How? Scrape the site or is there a way to download the stats that I don't know about?

There is an issue to make project data available via JSON, but I believe the only current public source is the HTML on project pages. So yes, I scraped the site.

As far as I know, download numbers are not broken out by week nor version in any publicly available source.

I don't think this is an issue unique to Drupal by any means, and perhaps Drupal can learn from other similar (i.e. core + modules) platforms. A couple of simple examples:

Search Drupal modules using term "Twitter" - the first result isn't a Twitter module at all - it refers to Twitter as a method of updating. Wordpress suffers a very similar issue - the best result for a plugin search often isn't at all relevent but still potentially generates a download.

Another problem that Wordpress suffers from is the depth of information published for each plugin. Drupal shares this issue with many modules - it's really hard to judge exactly what a module can do, or what it looks like, without downloading it.

Finally, there's no clear way to provide feedback about a module. Take the GMap module as an example, skipping over the fact that searching for "Google Maps" lists loads of other irrelevant stuff first. How good is it? How easy was it to implement? There's no rating or review mechanism (that I can see).

The data is interesting, though. To me, it suggests that there should also be a correlation between the usage of a module and it's likelihood of being abandoned. After all, if you wrote a module that 1,000 people downloaded but only 2 people used, would you keep maintaining it?

There are kinds of ways to explain why the numbers could be messed up. At one point I thought I read that the download numbers on d.o. were only for downloads from the site, Drush requests and Git clones didn't count. Is this still the case? Before I discovered Drush, I used to download modules locally and keep my own local collection of modules, and would reuse them. Luckily I discovered Drush & Drush Make. I also use Aegir, so I have many multisite installs that share modules. Those scenarios explain one download used many times. On the flip side, Aegir makes it very easy to spin up sandbox sites, which I do a lot. As proof of concept or experimentation. I routinely will fire up a site just to try some new module I read about and then delete the site 10 minutes later. The numbers are interesting, but not sure how much can really be pulled from them.

Scott W, I think you're right that the download numbers are lower than they could be, but doesn't that just make it even more perplexing that the download numbers are so much higher than the usage numbers?

Some of this can be attributed to test installs:

When I'm looking at using a new module I download it to my local WAMP server and play around with it on a "sandbox" Drupal install that I wipe every now and then.

Then I download it for the dev version of my production site and test it on there.

Finally I download it on the production site.

Add a comment

About the Author

Scott Reynen

Technology Director

Scott has worked on the web most of his life. He hosted his first site on a server with less computing power than his current phone. He's seen a lot change on the web over the years, but the things that drew him to the web — online community, web standards, and quick iteration — are still here. Those things have most recently drawn Scott to the Drupal community, where he helps to manage groups.drupal.org, local Denver meetups, and several popular modules on drupal.org.

Read More Scott's Blog Posts