Migrating from .html to extensionless URLs without losing SEO juice. Eeew.

If you’ve ever had a basic html website (or not so basic site, but with static html pages in it nonetheless) and eventually got around to migrating to an extensionless URL framework (e.g. MVC), then you may have had to think about setting the correct redirects so as to not dilute your SEO rating.

This is my recent adventure in doing just that within an MVC5 site taking over a selection of html pages from an older version of the site

SE-huh?

As your site is visited and crawled it will build up a score and a reputation among search engines; certain pages may appear on the first page for certain search terms. This tends to be referred to as “search juice”. Ukk.

Make that change

If you change your structure you can do one of the following:

Just use new URLs

Forget about the old ones; users might be annoyed if they’ve bookmarked something (but seriously, who bookmarks anything anymore?..), but also any reputation and score you have with the likes of Google will have to start again from scratch.

This is obviously the easiest option.

Dead as a dodo; gives an HTTP404:

http://mysite.com/aboutme.html

Alive and kickiiiiing:

http://mysite.com/aboutme

Map the old URLs to the new pages

Good for humans, but in the short term you’ll be diluting your SEO score for that resource since it appears to be two separate resources (with the score split between them).

Both return an HTTP200 and display the same content as each other:

http://mysite.com/aboutme.html

http://mysite.com/aboutme

Should you want to do this in MVC, firstly you need to capture the requests for .html files. These aren’t normally handled by .Net since there’s no processing to be done here; you might as well be processing .js files, or .css, or .jpg.

Capturing .html files – The WRONG way

For .Net to capture requests for html I have seen most people use this abomination:

<modules runAllManagedModulesForAllRequests="true" /> 

This will cause every single request to be captured and go through the .Net pipeline, even though there’s most likely nothing for it to do. Waste of processing power, and of everyone’s time.

Capturing .html files – The RIGHT Way

Using this as reference I discovered you can define a handler to match individual patterns:

<add name="HtmlFileHandler" path="*.html" verb="GET" 
    type="System.Web.HandlersTransferRequestHandler" 
    preCondition="integratedMode,runtimeVersionv4.0" />

Pop that in your node and you’ll be capturing just the .html requests.

Mapping them (for either option above)

In your Route.config you could add something along the lines of:

routes.MapRoute(
    name: "Html",
    url: "{action}.html",
    defaults: new { controller = "Home", action = "Index" }
);

This will match a route such as /aboutme.html and send it to the Home controller’s aboutme action – make sure you have the matching actions for each page, e.g.:

public ActionResult AboutUs()
{
    return View();
}
public ActionResult Contact()
{
    return View();
}
public ActionResult Help()
{
    return View();
}

Or just use a catch all route …

routes.MapRoute(
    name: "Html",
    url: "{page}.html",
    defaults: new { controller = "Home", action = "SendOverThere" }
);

and the matching catch all ActionResult to just display the View:

public ActionResult SendOverThere(string page)
{
    return View(page); // displays the view with the name <page>.cshtml
}

Set up permanent redirects from the old URLs to the new ones

CORRECT ANSWER!

To preserve the aforementioned juices, you need to set up RedirectResults instead of ActionResults for each page on your controller which return a PermanentRedirect, e.g.:

public RedirectResult AboutUs()
{
    return RedirectPermanent("/AboutUs");
}

Or use that catch all route …

routes.MapRoute(
    name: "Html",
    url: "{page}.html",
    defaults: new { controller = "Home", action = "SendOverThere" }
);

… and action:

public RedirectResult SendOverThere(string page)
{
    return RedirectPermanent(page);
}

Using the Attribute Routing in MVC5 you can do something similar and simpler, directly on your controller – no need for a Route.Config entry:

[Route("{page}.html")]
public RedirectResult HtmlPages(string page)
{
    return RedirectPermanent(page);
}

Just add the line below into your Route.Config

routes.MapMvcAttributeRoutes();

Having problems?

This routing always drives me crazy. I find it extremely hard to debug routing problems; which is why I’d like to point you towards Phil Haack’s RouteDebugger nuget package and the accompanying article

RouteDebugger screenshot

or even Glimpse.

Glimpse screenshot

Both of these have the incredible ability to analyse any URL you enter against any routes you’ve set up, telling you which ones hit (if any). This helped me no end when I finally discovered my .html page URL was being captured, but it was looking for an action of “aboutus.html” on the controller “home”.

Worth checking out

Superscribe is an open source project by @roysvork that uses graph based routing to implement unit testable, fluent, routing, which looks really clever. I’ll give that a shot for my next project.

Good luck, whichever route you take!

(see what I did there?..)

Generate a Flat File Web Site using RazorEngine and RazorMachine

Razor and ASP.Net

As a normal person you’d probably be happy with how Razor template files are used within MVC; there’s a nice convention for where they live – they’ll be in a Views folder within your project most likely – and you refer to them either by name or sometimes just by convention – what’s that? You have an ActionResult method called “Index”? I’ll go fetch the “Index” view from the folders I normally expect the cshtml files to live in for ya then.

The way this works is fantastic; development can steam ahead without the pain and confusion of all of the possible ways you could do it wrong when choosing webforms and .aspx files.

Of course, the MS implementation of an MVC framework in itself is a wonderful thing; all but enforcing the separation of concerns that is just so easy to ignore in webforms.

Razor outside of ASP.Net

But what about when you want to dynamically generate html without a process being hosted as a website? One big use case for this is email generation; sure, you could host an MVC web API and have the content generation process constantly call it, but that seems a little inefficient.

RazorEngine and RazorMachine

There are a few solutions to this; you can actually hand roll your own (I might get onto that in a future post) or you can try out some reasonably well known open source solutions like RazorEngine:

A templating engine built on Microsoft’s Razor parsing engine, RazorEngine allows you to use Razor syntax to build dynamic templates:

string template = "Hello @Model.Name, welcome to RazorEngine!";
string result = Razor.Parse(template, new { Name = "World" });

and RazorMachine:

RazorMachine is a robust and easy to use .Net Razor v2/v3 template engine. The master branch uses Razor v3. This implementation supports layouts (masterpages) and a _viewStart construct, just like MVC does support these features. The RazorEngine works independently from MVC. It only needs the System.Web.Razor reference. It almost works exacly like Asp.Net MVC

var rm = new RazorMachine();
var result = 
   rm.Execute("Hello @Model.FirstName @Model.LastName", new {FirstName="John", LastName="Smith"});

There’s a short stackoverflow answer comparing them (and RazorTemplates, another similar OSS solution) too.

Getting stuck in

Create a new project and use some nuget awesomeness

Install-Package razormachine

Then, if you don’t already, add references to

system.web.helpers // for json.decode
microsoft.csharp // for dynamic types

If you want to debug this functionality via a console app running from VisualStudio, you may need to uncheck “enable visual studio hosting process” in Project -> Properties -> Debug

If you want to run this outside of Visual Studio, you can just run the compiled exe (bin/debug) as admin.

If you’re using a test runner then you might be fine as is. I can’t actually remember the issue I was having as I now can’t recreate it, but I think it might have been around using dynamic models and Json decoding.

RazorEngine

This is the core bit of functionality for a basic use case for RazorEngine:

var model = Json.Decode("{\"Description\":\"Hello World\"}");
var template = "<div class=\"helloworld\">@Model.Description</div>";
const string layout = "<html><body>@RenderBody()</body></html>";

template = string.Format("{0}{1}", "@{Layout=\"_layout\";}", template);

using (var service = new TemplateService())
{
    service.GetTemplate(layout, null, "_layout");
    service.GetTemplate(template, model, "template");

    var result = service.Parse(template, model, null, "page");

    Console.Write(result);
    Console.ReadKey();
}

Your output should be:

<html><body><div class="helloworld">Hello World</div></body></html>

Pretty easy, right?

RazorMachine

Here’s the equivalent in RazorMachine

var model = Json.Decode("{\"Description\":\"Hello World\"}");
var template = "<div class=\"helloworld\">@Model.Description</div>";
const string layout = "<html><body>@RenderBody()</body></html>";

var rm = new RazorMachine();
rm.RegisterTemplate("~/shared/_layout.cshtml", layout);

var renderedContent = 
    rm.ExecuteContent(string.Format("{0}{1}", "@{Layout=\"_layout\";}", template), model);
var result = renderedContent.Result;

Console.Write(result);
Console.ReadKey();

Again, same output:

<html><body><div class="helloworld">Hello World</div></body></html>

Notice that in both of them you have to lie about there being a layout file existing somewhere; in RazorEngine you give it a name:

template = string.Format("{0}{1}", "@{Layout=\"_layout\";}", template);

then refer to that name when adding the template:

service.GetTemplate(layout, null, "_layout");

In RazorMachine you register the template as a dummy virtual file:

rm.RegisterTemplate("~/shared/_layout.cshtml", layout);

then refer back to it as you would normally do within ASP.Net MVC when executing the content:

var renderedContent = 
      rm.ExecuteContent(string.Format("{0}{1}", "@{Layout=\"_layout\";}", template), model);

Differences

I’ve found it easier to process sub templates (such as @Include) within RazorEngine, as I just recursively scan a file for that keyword and add the corresponding template to the service, e.g. look at the ProcessContent and ProcessSubContent methods below:

public class RenderHtmlPage
{
    private readonly IContentRepository _contentRepository;
    private readonly IDataRepository _dataRepository;

    public RenderHtmlPage(IContentRepository contentRepository, 
                          IDataRepository dataRepository)
    {
        _contentRepository = contentRepository;
        _dataRepository = dataRepository;
    }

    public string BuildContentResult(string page, string id)
    {
        using (var service = new TemplateService())
        {
            // get the top level razor template, e.g. "product"
            // equivalent of "product.cshtml"
            var content = GetContent(page);
            var data = GetData(id);

            ProcessContent(content, service, data);
            var result = service.Parse(content, data, null, page);

            return result;
        }
    }

    private void ProcessContent(string content, 
                                TemplateService service, 
                                dynamic model)
    {
        // does the string passed in reference a Layout at the start?
        const string layoutPattern = @"@\{Layout = ""([a-zA-Z]*)"";\}";

        // does the string passed in reference an Include anywhere?
        const string includePattern = @"@Include\(""([a-zA-Z]*)""\)";

        // recursively process the Layout
        foreach (Match match in Regex.Matches(content, layoutPattern, 
                                                RegexOptions.IgnoreCase))
        {
            ProcessSubContent(service, match, model);
        }

        // recursively process the @Includes
        foreach (Match match in Regex.Matches(content, includePattern, 
                                                RegexOptions.IgnoreCase))
        {
            ProcessSubContent(service, match, model);
        }
    }

    private void ProcessSubContent(TemplateService service, 
                                    Match match, 
                                    dynamic model)
    {
        var subName = match.Groups[1].Value; // got an include/layout match?
        var subContent = GetContent(subName); // go get that template then
        ProcessContent(subContent, service, model); // recursively process it

        service.GetTemplate(subContent, model, subName); // add it to the service
    }

    private string GetContent(string templateToLoad)
    {
        // hit the filesystem, db, API, etc to retrieve the razor template
        return _contentRepository.GetContent(templateToLoad);
    }

    private dynamic GetData(string dataToLoad)
    {
        // hit the filesystem, db, API, etc to return some Json data as the model
        return Json.Decode(_dataRepository.GetData(dataToLoad));
    }
}

Why is this useful?

I’m not going to go into the details of either RazorMachine or RazorEngine; there’s plenty of documentation up on their respective websites already. I’ve used @Includes in the examples above due to its simplicity; the libraries have differing support for things like @Html.Partial and also can be extended.

Unfortunately, the html helpers (like @Html.Partial) need to have an HttpContext and run inside of ASP.Net MVC; which is what I’m trying to avoid for now.

If you pull down my initial teeny solution from github and look at the tests you’ll notice the content of the template, layout, and model are either strings or coming from the filesystem; not related to the structure of the project or files in the project or anything like that.

This means we can deploy a rendering process that returns rendered html based on strings being passed to it. Let’s play with this concept a bit more.

Flat File Web Page Generation

Say you wanted to “host” a website directly within a CDN/cache, thus avoiding the hosting via the normal route of servers and related infrastructure. Sure, writing flat html in a text editor is a solution, but what if you wanted to still be able to structure your pages into common modules, write C# code to manage the logic to dynamically combine them, and use Razor syntax and views for defining the front end?

This next section plays on this concept a bit more; we’ll write a small app that accesses a couple of directories – one for Razor files, one for data files – and generates a flat website into a third directory.

I will then expand on this concept over a series of posts, to make something more realistic and potentially useful.

Command Line & FileSystem FTW

I’ve created another repo up on github for this section, but cutting to the chase – here is the guts of demo console app:

const string workingRoot = "../../files";
IContentRepository content = 
    new FileSystemContentRepository(workingRoot + "/content");

IDataRepository data = 
    new FileSystemDataRepository(workingRoot + "/data");

IUploader uploader = 
    new FileSystemUploader(workingRoot + "/output");

var productIds = new[] {"1", "2", "3", "4", "5"};
var renderer = new RenderHtmlPage(content, data);

foreach (var productId in  productIds)
{
    var result = renderer.BuildContentResult("product", productId);
    uploader.SaveContentToLocation(result, productId);
}

The various FileSystemXX implementations either just read or write files from/to the file system. Natch.

So what we’ve got here is an implementation of the RazorEngine methods I pasted in above wrapped in a RenderHtmlPage class, being called for a number of “productIds”; these happen to exist as json files on disc, e.g. “1.json”.

Each file is being combined with whatever Razor templates are listed in the product cshtml file and its referenced @Includes. The resulting html is then saved back to the file system.

So with these views in files/content:
razorengine-flat-file-website-views

And these json files in files/data:
razorengine-flat-file-website-jsondata

We get these html files generated in files/output:
razorengine-flat-file-website-htmloutput

Hopefully you can see where this is leading; we can keep Views in one place, get the model data from somewhere else, and have the extremely generic rendering logic in another place.

The Theory

With this initial version we could take an existing ASP.Net MVC website (assuming it didn’t use any html helpers in the views..) and process it offline with a known dataset to create a readonly version of the website, ready to serve from a filesystem.

Next Up

I’ll take this concept and run with it across various implementations, gradually ending up on something that might even be useful!

WebForms ScriptManager Vs MVC – FIGHT!

If you’ve tried to squeeze MVC into a WebForms project which uses ScriptManager elements for AJAX functionality, be sure to add some hardcore IgnoreRoute entries in your route registration section.

If you don’t then you’ll find the calls to your asmx webservice that ScriptManager creates will receive 404 errors looking for asmx/js or asmx/jsdebug that contain an HTTPException which looks like:

The controller for path blah.asmx/js was not found or does not implement IController

or if you’re in debug mode

The controller for path blah.asmx/jsdebug was not found or does not implement IController

This basically means that the pattern {folder}/{file}.asmx/{something} isn’t matching a route. Since it shouldn’t match one then you need to make sure you add in an exception.

Ignore a specific file type

This one didn’t actually work for me as expected, but is worth listing here:

routes.IgnoreRoute("{resource}.asmx/{*pathInfo}");

Ignore an entire folder

This brute force attack worked for me:

routes.IgnoreRoute("{folder}/{*pathInfo}", new { folder = "WebServices" });

Strangeness

I didn’t need to add in the IgnoreRoute on one IIS7 instance but did on another IIS7 server. Not sure why, probably due to HTTPHandler configuration within IIS itself?