404 Not Found: The Monster Under Your Bed
If you are working on a website redesign, 404s are the very real monsters under your bed. Ignore them, and they will wreak havoc on your website’s traffic. Worst of all, by the time you realize what’s happening it may already be too late.
What are 404s?
Very simply, 404s are broken links. More specifically, 404 is the HTTP response code for “Not Found,” signifying that a web page is not available at the provided URL. Reorganizing old content, changing old URLs and selectively discarding content that is no longer relevant are all common activities during website redesign projects that can result in 404s.
Why 404s Are so Bad
Your legacy content – the stuff that’s been around for 15 years, from the most up-to-date research articles, to blog posts written by employees long-gone, to PDF files in random folders off your webroot – has been quietly growing your website traffic, catching inbound links and increasing effectiveness of organic search. And the longer it has been around, the more valuable it has likely become, even if the content itself is no longer of much relevance to your organization. A quick scan of your Google Analytics will likely confirm this. Your organic search traffic probably has a very long tail: thousands or tens of thousands of pages with a few hits each, funneling users to your website.
If those URLs change, or that content is abandoned entirely, the potentially massive net you have been casting – and growing – for years will be damaged. Despite the very best user experience, the most on-target messaging and the most compelling design, years of search engine optimization (SEO) progress can be lost – all because of 404s. Your organic search rank will drop as search engines remove the now-broken URLs from their indexes. As a result, traffic will plummet. All of this can very quickly bring the success of your entire redesign project into question.
In website redesigns, 404s may very well be your worst enemy.
Combatting 404s Starts with Content Strategy
Dealing with 404s is an important, often overlooked component of effective content strategy. Communications teams frequently devote significant time to performing content audits, flagging content to be be reorganized, rewritten or abandoned altogether. Far less time – if any – is given to thinking through exactly what to do with content that is left behind. It is simply abandoned. Soon after launch, someone in marketing notices a drop in traffic and suddenly 404s are on everyone’s radar.
By Default, Keep Everything
When redesigning a website, we recommend keeping just about everything. That might be opposite of what you’ve heard before. It doesn’t lend itself to the “cleaning out the garage” or “moving to a new house” metaphors. In reality, though, your legacy content is one of your greatest assets. That junk in the garage is gold. Deal with it, but don’t abandon it.
For outdated content, channel users to more relevant offerings with good user experience design and carefully crafted messaging. Old content – even if outdated – represents an opportunity to connect with users you otherwise might miss entirely, communicating key changes in your organization or pointing to relevant, up-to-date resources. Again, dealing with legacy content is an important element of content strategy. It deserves design attention and good user experience. Craft a simple message that says “This resource is out of date. To see our more recent work in this area, see X, Y or Z.”
For content that is rewritten or moved to a new URL, use 301 redirects to redirect users automatically from old pages to their new equivalents. 301 redirects, or 301s, signal to search engines that a resource has not been eliminated; rather, it has been “Moved Permanently” and should be reindexed at its new location. 301s are hands-down the most important technical device for dealing with 404s.
(Note that 301s do not guarantee that your content will maintain its rank within search results. Rather, 301s indicate to search engines that the resource for a particular URL has been moved. Search engines will queue the new URL for reindexing, and search rank will once again be determined by a broad spectrum of factors like keyword density, page title, inbound links, etc.)
Add 301 Redirects for All Migrated Content
When migrating legacy content into your redesigned website, add a 301 redirect for every single resource, article or page being migrated. As of right now in Drupal 7, a patch for the redirect module makes this process easy: simply map the old URL to the special destination field “migrate_redirects” and the redirect module will take care of the rest.
In Drupal 8, the redirect module provides built-in support for migrating redirects from older versions of the Content Management System. A little bit of custom code in your scripted migration can take care of adding redirects for migrated nodes. (Need more info? Let us know in the comments or get in touch.)
Find and Prioritize All Legacy URLs
While adding 301 redirects for every migrated page is critical, it is not enough. Google has likely indexed large numbers of URLs for content that will not be included in your scripted migration process. Landing pages, listing pages, PDFs and anything you have specifically decided not to migrate will be omitted if your focus is solely on individual articles. To better understand the full scope of URLs that need to be dealt with, download a report of all pages from Google Analytics or whatever analytics platform you are using. This not only provides a thorough catalog of web pages, PDFs and other resources being viewed, but also shows a count of monthly page views and is incredibly helpful for establishing priority for specific pages to be redirected. Remember, your traffic has a long tail; the potentially thousands of pages that receive one or two views per month are still important.
Test All Legacy URLs In Your Redesigned Website
Once you have a list of all legacy URLs you need to test your new, redesigned website to see which URLs are resulting in “404 Not Found” errors. We have a few custom scripts that do exactly that, written in environments ranging from Drupal modules to standalone NodeJS apps. Regardless of the specific implementation, the script needs to do the following:
- Import a list of legacy URLs downloaded from your analytics service.
- Loop through the list of URLs and test each on the new website to see what HTTP status code is returned.
- If a 301, 302 or other redirect is returned, follow it to ensure it eventually results in a URL with an acceptable status (200 OK).
- Generate a report of returned status codes. We typically include page views from the originally downloaded analytics report in this CSV so we can see the status code directly beside the number of monthly views for each URL. Seeing the HTTP status code, URL and number of pageviews all side-by-side in spreadsheet format is incredibly helpful for gauging priority.
The first time you run your script, you will likely see a very high volume of 404s. That’s fantastic: you’re seeing them now, during the redesign, before they are anywhere close to impacting SEO or traffic.
Fix the 404s
Your report of returned status codes provides a prioritized list of 404s that need to be dealt with. You will likely see a mix of landing pages, listing pages, articles, PDF files and other resources. Each URL needs to be dealt with.
Often, large numbers of similar URLs can be redirected programmatically – that is, by matching patterns rather than specific addresses. For example, a collection of folders containing PDFs may have been moved to new locations. Or URLs for pages that show content by category may need to be mapped to new category ids. Depending on the complexity of the specific redirect pattern and the environment in which your website is hosted, programmatic redirects can be added to Drupal in a variety of ways, as follows:
- Using mod_rewrite in your .htaccess file
- The match redirect module for Drupal 7
- A custom module using hook_init()
- Custom code in settings.php
Watch Out for Index.html
If your legacy URLs are directory indexes (i.e. ending with “index.html” or “index.htm”) you will need to add an additional redirect for the version that does not include the file name.
Example: if your legacy URL is “http://example.com/path/to/file/index.html” and the new equivalent is “http://example.com/new/path/to/file”, you will need two redirects:
- One from “http://example.com/path/to/file/index.html” to the new URL
- Another form “http://example.com/path/to/file” (without index.html) to the new URL
We typically add additional redirects for directory indexes once all other redirect work is finished, using a simple custom script that scans the redirects table for index pages and generates the appropriate equivalent.
Test Again, Rinse and Repeat
Once all 404s have been dealt with in the ways outlined above, test your redesigned website again. You will likely find a few URLs that still need to be addressed. Rinse and repeat until the entire list of prioritized pages returns the acceptable status code of 200.
Not Quite Done
And that’s it. Almost. The final piece to combatting 404s is to monitor them closely after launch. The redirect module provides a simple admin page for doing exactly that. We strongly recommend monitoring 404s for several days after launch and adding 301s wherever appropriate.
Sit Back and Relax
Website redesign projects usually impact organizations at all levels, and we know you probably won’t be able to truly sit back and relax after launch. There will be final communications details, stakeholder reviews, content updates, ongoing bug fixes and likely a growing list of next-phase wishlist items. That said, dealing with 404s will help protect your investment in organic search and mitigate deleterious effects on web site traffic. There will still be a dip in the numbers as Google and other search engines update their indexes and re-crawl new content. This post doesn’t address SEO strategy in-depth, nor setting specific traffic goals and benchmarks as a part of planning and discovery for your website redesign. It does express the very clear need to accommodate modified URLs and abandoned pages. Without an effective redirect strategy, 404s will almost certainly wreak havoc on your organic search traffic. Good content strategy and 301 redirects are critical allies for fighting 404s and protecting your years-long investment in SEO.