Migrating from .html to extensionless URLs without losing SEO juice. Eeew.

If you’ve ever had a basic html website (or not so basic site, but with static html pages in it nonetheless) and eventually got around to migrating to an extensionless URL framework (e.g. MVC), then you may have had to think about setting the correct redirects so as to not dilute your SEO rating.

This is my recent adventure in doing just that within an MVC5 site taking over a selection of html pages from an older version of the site

SE-huh?

As your site is visited and crawled it will build up a score and a reputation among search engines; certain pages may appear on the first page for certain search terms. This tends to be referred to as “search juice”. Ukk.

Make that change

If you change your structure you can do one of the following:

Just use new URLs

Forget about the old ones; users might be annoyed if they’ve bookmarked something (but seriously, who bookmarks anything anymore?..), but also any reputation and score you have with the likes of Google will have to start again from scratch.

This is obviously the easiest option.

Dead as a dodo; gives an HTTP404:

http://mysite.com/aboutme.html

Alive and kickiiiiing:

http://mysite.com/aboutme

Map the old URLs to the new pages

Good for humans, but in the short term you’ll be diluting your SEO score for that resource since it appears to be two separate resources (with the score split between them).

Both return an HTTP200 and display the same content as each other:

http://mysite.com/aboutme.html

http://mysite.com/aboutme

Should you want to do this in MVC, firstly you need to capture the requests for .html files. These aren’t normally handled by .Net since there’s no processing to be done here; you might as well be processing .js files, or .css, or .jpg.

Capturing .html files – The WRONG way

For .Net to capture requests for html I have seen most people use this abomination:

<modules runAllManagedModulesForAllRequests="true" /> 

This will cause every single request to be captured and go through the .Net pipeline, even though there’s most likely nothing for it to do. Waste of processing power, and of everyone’s time.

Capturing .html files – The RIGHT Way

Using this as reference I discovered you can define a handler to match individual patterns:

<add name="HtmlFileHandler" path="*.html" verb="GET" 
    type="System.Web.HandlersTransferRequestHandler" 
    preCondition="integratedMode,runtimeVersionv4.0" />

Pop that in your node and you’ll be capturing just the .html requests.

Mapping them (for either option above)

In your Route.config you could add something along the lines of:

routes.MapRoute(
    name: "Html",
    url: "{action}.html",
    defaults: new { controller = "Home", action = "Index" }
);

This will match a route such as /aboutme.html and send it to the Home controller’s aboutme action – make sure you have the matching actions for each page, e.g.:

public ActionResult AboutUs()
{
    return View();
}
public ActionResult Contact()
{
    return View();
}
public ActionResult Help()
{
    return View();
}

Or just use a catch all route …

routes.MapRoute(
    name: "Html",
    url: "{page}.html",
    defaults: new { controller = "Home", action = "SendOverThere" }
);

and the matching catch all ActionResult to just display the View:

public ActionResult SendOverThere(string page)
{
    return View(page); // displays the view with the name <page>.cshtml
}

Set up permanent redirects from the old URLs to the new ones

CORRECT ANSWER!

To preserve the aforementioned juices, you need to set up RedirectResults instead of ActionResults for each page on your controller which return a PermanentRedirect, e.g.:

public RedirectResult AboutUs()
{
    return RedirectPermanent("/AboutUs");
}

Or use that catch all route …

routes.MapRoute(
    name: "Html",
    url: "{page}.html",
    defaults: new { controller = "Home", action = "SendOverThere" }
);

… and action:

public RedirectResult SendOverThere(string page)
{
    return RedirectPermanent(page);
}

Using the Attribute Routing in MVC5 you can do something similar and simpler, directly on your controller – no need for a Route.Config entry:

[Route("{page}.html")]
public RedirectResult HtmlPages(string page)
{
    return RedirectPermanent(page);
}

Just add the line below into your Route.Config

routes.MapMvcAttributeRoutes();

Having problems?

This routing always drives me crazy. I find it extremely hard to debug routing problems; which is why I’d like to point you towards Phil Haack’s RouteDebugger nuget package and the accompanying article

RouteDebugger screenshot

or even Glimpse.

Glimpse screenshot

Both of these have the incredible ability to analyse any URL you enter against any routes you’ve set up, telling you which ones hit (if any). This helped me no end when I finally discovered my .html page URL was being captured, but it was looking for an action of “aboutus.html” on the controller “home”.

Worth checking out

Superscribe is an open source project by @roysvork that uses graph based routing to implement unit testable, fluent, routing, which looks really clever. I’ll give that a shot for my next project.

Good luck, whichever route you take!

(see what I did there?..)

WebForms ScriptManager Vs MVC – FIGHT!

If you’ve tried to squeeze MVC into a WebForms project which uses ScriptManager elements for AJAX functionality, be sure to add some hardcore IgnoreRoute entries in your route registration section.

If you don’t then you’ll find the calls to your asmx webservice that ScriptManager creates will receive 404 errors looking for asmx/js or asmx/jsdebug that contain an HTTPException which looks like:

The controller for path blah.asmx/js was not found or does not implement IController

or if you’re in debug mode

The controller for path blah.asmx/jsdebug was not found or does not implement IController

This basically means that the pattern {folder}/{file}.asmx/{something} isn’t matching a route. Since it shouldn’t match one then you need to make sure you add in an exception.

Ignore a specific file type

This one didn’t actually work for me as expected, but is worth listing here:

routes.IgnoreRoute("{resource}.asmx/{*pathInfo}");

Ignore an entire folder

This brute force attack worked for me:

routes.IgnoreRoute("{folder}/{*pathInfo}", new { folder = "WebServices" });

Strangeness

I didn’t need to add in the IgnoreRoute on one IIS7 instance but did on another IIS7 server. Not sure why, probably due to HTTPHandler configuration within IIS itself?