16 December 2016

Site Redesign: How to deal with pages that have moved or been eliminated in ASP.NET MVC

The SEO Problem with site redesigns

It is very common for sites to experience SEO problems after being redesigned. Specifically, loss of traffic and search engine results page (SERP) demotion for terms.

Why this happens

404

Modern search engines determine whether or not to show a link to a page on your site based on hundreds of factors.

When you move content from one URL to another, after the search engine bot/spider discovers the new content, it will begin indexing that content (a good thing), but unless you properly tell it that the old content has moved to the new location, it will continue to attempt to crawl and index the old URL, and, until the new content is well established, it may show the old (now broken) URL in the SERP.

Showing the broken link to users is problematic because in addition to the crawl, search engines "learn" whether or not a given page on your site is "good" based on how users act after clicking on the link. If the user quickly returns to the search engine, the machine learning algorithms will begin lowering the value of that URL.

EVENTUALLY, the search engines will catch up (assuming that your new content is on par with your old content).

Until they do, however, you will take a traffic and conversion rate hit.

How to Mitigate the Impact of URL moves

First, monitor your crawl errors in the webmaster tools provided by the major search engines. When you see new "Not Found" errors, fix them ASAP.

Second, monitor and log traffic to your website's error pages. When you see errors, fix them ASAP.

Naïve approach

If you only had one page that returned a 404 (Not Found) error, the fix would be as simple as building a controller that returned a 301 (Permanent Redirect) with the new URL. This is user-friendly: if a user visits the old URL, (s)he is immediately redirected to the new page. It is also search-engine-friendly: the spider/bot for the search engine understands the meaning of the HTTP status code 301 and will begin updating its index to use the new URL in lieu of the old URL.

Unfortunately, building a new controller every time a new error hits the logs is time-consuming and a waste of developer resources.

A better way

Rule-based 404 handling

A better way to handle the problem is to build a generic mechanism that handles three cases as follows:

  • old URL → new URL with a 301 status
  • old URL with no planned/intended replacement → a friendly error page that returns a 410 (Gone) status code (not a 404, which is temporary)
  • old URL of which you are unaware → a friendly error page that returns a 404 status code (this is "almost" the default in ASP.NET MVC with Custom Errors enabled – the framework actually returns a 302 and then a 404).

Our design goals are:

  • To not need to write code for newly-discovered broken links,
  • To maintain rules in a simple text file,
  • To have rule-order precedence, and
  • To have the server update the rules in use when the text file is saved

Replace the default error handling mechanism

Disable CustomErrors

In the web.config file in the root of your site, disable custom errors: <customErrors mode="Off" />

Wire up a replacement error page

In the code file for the application, Global.asax.cs, insert code similar to the following:

Since we're completely replacing the custom error handling in ASP.NET, all unhandled non-HttpException errors are converted to HTTP Status Code 500 errors.

The code for our HandleHttpException method is shown below. It clears the error on the server, asks IIS (which is the web server that hosts most ASP.NET websites) to skip any custom error handling it has in place, and finally, executes a custom error page controller.

Since we're working in the HttpApplication directly, we have to build the RouteData ourselves and then execute the controller to get back into the MVC framework.

Since in some cases our controller will return an error page, the code that follows will use a model with three public properties: ErrorMessage, StatusCode, and Url. These represent the HTTP error message, status code, and the page URL that generated the error.

HttpErrorModel class diagram

The code below is very simple. If the HTTP status code is 404 (Not Found), then look in the OldUrl property in each of our rules to see if we have one that has a pattern that matches the URL. If a matching rule is found, if there is a non-empty NewUrl, then return a permanent RedirectResult to the new URL. If there is an empty NewUrl, then we return a 410 (i.e. we have created a rule for content that will not be replaced). If we don't have a matching rule, then a 404 is returned.

To make this into a flexible system, we're going to store our rules in a text file as JSON and use Regular Expression patterns.

Below is an example of some sample rules.

Our controller class will store its rules in the default MemoryCache. This collection of rules will have a cache policy that causes a refresh when the JSON text file is changed. The initial caching of the rules will come from reading a JSON file and decoding it into a POCO.

Below is the CacheMappings code. The code to read the text from file and to convert the JSON to a POCO object is omitted for brevity.

If you implement a 404-handling pattern similar to the one shown in this article in your site redesign, then when "Not Found" errors are logged by either the search engines or your internal logging, the fix is as simple as adding a rule to the JSON file.

No comments: