Drupal.org Project Retention, Download to Usage
Published by Scott Reynen
A couple months ago, download statistics were added to projects on Drupal.org. Previously, we could only see how many sites reported using each project, so download numbers provide an interesting new data point. I was particularly interested in how the two numbers compare, so I did some research, which I'll share here. Specifically, I gathered both the usage and the download numbers for every project on Drupal.org, and looked at some general statistics.
My initial reaction when looking at the download numbers on my own projects was something "oh great, those are big numbers." But after thinking about it a bit more, I realized that's actually an indication something isn't great. If the download number is significantly bigger than the usage number, that means a lot of people are downloading the project without actually using it. Some of that can be explained by sites that don't report usage back to Drupal.org, but a substantial portion of it seems likely to be users who decided not to use a project after downloading it. I can imagine a few reasons for this, including a problematic install process, an inaccurate project description, and a project that just doesn't work well.
I would hope most of this would be reported in issue queues, but after looking at the numbers, I doubt that's happening. The following chart shows what I'm calling the "retention rate," the percent of Drupal.org downloads that turn into reported usage, for every project on Drupal.org:
I was expecting this chart to look more like a bell curve, with a common retention rate somewhere in the middle, maybe at around 50%, and fewer projects both above and below this rate. Instead, the most common retention rate is zero, and fewer and fewer projects have higher rates. Very few projects are above 25% retention.
"Retention rate" is a somewhat misleading label for this. Because we're comparing actual downloads to reported usage, there's a gap of unreported usage that explains some of this. But because some projects have near 100% retention, it seems safe to discount that. Some even have over 100% retention, as it's possible to download a project once and use it on multiple sites.
I thought the distribution of retention rates might change based on popularity of projects, so I looked at this same chart on different subsets of more popular projects. If we limit the data to just the top 1000 projects on Drupal.org (by usage numbers), we start to see something approaching a bell curve, with a peak at around 20% retention. That's still very low, and it stays about the same for the top 500 and top 100 projects. But the difference between the top 1000 and all projects does suggest retention during initial install is an important factor in Drupal.org project success.
So what can we learn from this? My initial reaction is that these numbers suggest a lot of room for improvement in the initial experience using new projects in Drupal. If these charts are truly an indication that the vast majority of projects downloaded from Drupal.org are never used, we could make a huge improvement in Drupal adoption by looking closely at the site builder experience immediately after downloading projects. Are people having trouble figuring out how to enable modules and themes? Are dependencies a stumbling block? Are project descriptions just not clear enough?
Ideally everyone who downloads a project on Drupal.org would successfully install it, find it solves the problem they were hoping it would solve, and we'd see usage numbers at close to 100% of download numbers. We're currently far from that, which was a big surprise for me. I'd be happy to hear alternate explanations for this data and have shared the raw data to that end. It would also be great to hear more from what is apparently a large number of people who are downloading Drupal.org projects and not using them.