Getting your redirects right

I suspect a fairly common scenario for Sitecore developers is launching a new site which replaces an existing one with a shiny new design and content structure. It’s a fairly common requirement of these projects that whoever is in charge of SEO will want redirects in place from important old URLs on the site, to new ones. They ensure that users who have bookmarks to the old pages don’t see 404s, and try to keep the search engine rankings which had been acquired by the old site.

Another common scenario these days is for new websites to serve all of their pages under HTTPS, rather than just the “sensitive” pages as we might have done in the past.

When you combine these two needs together, you can end up with more complicated redirection rules than you might have needed in the past. If you’re planning to make use of the the Url Redirect module from the Sitecore Marketplace, my experiences doing this might be of help to you:

Generalising the rules you want

In an ideal world, we want any redirection the site is going to perform to happen in a single operation. SEO people will tell you that having a chain of redirects is bad for your search rankings – you don’t want your pages to look like spam to the search engines. So in this scenario we need to have two distinct groups of rules:

  • Specific redirects: Rules to send HTTP(S) requests for old URLs to HTTPS requests for new URLs.
  • A Generic redirect: Rule to send any other HTTP request to HTTPS.

We need zero or one of these rules to trigger for any request arriving at the server.

The specific redirect rules

The rules for matching old URLs need to deal with whatever file extensions are appropriate for the site being replaced. In the case I was working on, the URLs might or might not end with a trailing slash. While I’m keenly aware of the usual jokes about regular expressions, using them in an “Inbound Rule” via the redirect module is the easiest way to get an accurate match here.

In my case, detecting a URL which requires a redirect needs an expression based on the pattern:

^the-url-to-redirect/?$

Which translates as:

  • Match the start of the string: ^
  • Match the URL required: the-url-to-redirect
  • Optionally match a trailing slash: /?
  • Match the end of the string: $

When constructing these rules you need to remember that you don’t see the protocol, host name or querystring at this point. If you’re not confident with regular expressions, a testing tool such as the .Net Regex Tester can be very helpful to help you get the syntax right. This is generally easier than trying to debug the expressions by running redirect rules in Sitecore.

Your rule item probably ends up looking a bit like this: (Click to enlarge)

Specific Inbound Rule

If this matches, you then need a Redirect item to describe what should happen.

Here, we need to specify a few things:

  • The redirection needs to go to HTTPS no matter what the original protocol of the request was.
  • We need to maintain the same host name as the original request.
  • We probably want a Permanent (301) redirection.
  • If this rule is matched, we don’t want any other rules to be evaluated.

The first two are dealt with by the replacement expression that we define in the “Rewrite URL” field. This will look something like:

https://HTTP_HOST/my-new-site-url

The {HTTP_HOST} token will be replaced with whatever the host was in the original request. Using this approach rather than hard-coding the host allows the rules to be tested on non-production servers prior to deployment.

The third and fourth bullets are dealt with by the “Redirect Type” field and the “Stop processing of subsequent rules” checkbox.

So your redirect item will look something like:

Specific Redirect

You’ll need to follow that pattern to create a set of rules for redirecting the old URLs that your SEO people require. You may find that using folders to organise the rules for sub-levels of the old site will help keep things neater. Folders are ignored by the logic which processes the rules, so they just help you organise rules.

The generic redirect rule

There’s only one rule needed here. It’s a little more complex to construct, however.

First of all, the Inbound Rule here needs to match everything, as we only get here if no other rule has matched. The example rules that ship with the module use (.*) for this purpose. The brackets here are required to make a “group” in Regular Expression terms – basically some characters we’re going to want to be able to fetch later.

Secondly, you need to add a Condition to qualify the Inbound Rule. We’ve already told it to “match any path” but we only want this rule to trigger if the request is made under HTTP. The data needed for that is as follows:

HTTPS Condition

That means the rule only triggers when the flag saying “is the request under HTTPS” is false.

And finally, we need the redirect:

HTTPS Redirect

The only difference here is the use of the {R:1} token in the target URL. This fetches back the group we matched above, so we can paste the original requested URL into our re-written request.

The ordering conundrum

With the rules set up like that, you should find that whether you request an old URL or a new URL under HTTPS, you end up at the new URL under HTTPS in a single redirect.

But when I’d finished testing this and deployed it to my QA infrastructure, it stopped working as expected. Suddenly I was seeing the rules firing in such a way that two redirects were required. The generic HTTPS rule would trigger first for my test, followed by the specific rule.

Cue a few hours spent with the source code from Github and the trusty Visual Studio debugger…

The reason, it turns out is that the rules are fetched by a query when the module starts up, and it’s based on the item template not content structure. The order items come back in isn’t really defined here, but is most likely related to the order they exist in the underlying database. On my dev machine, I’d created them in the correct order so by luck they were working there, but copying them across to the QA site via a package hadn’t maintained that ordering. Hence the list of rules in memory ended up in a different order to the rules in the content tree – because that’s always sorted by the __sortorder field.

I ended up solving my problem by adding “sort by __sortorder” into the redirect module code, so that each time a rule is updated the cached rule-set in memory is re-sorted. The change set is available via github if you find yourself in a similar situation, and I’ve submitted a Pull Request for this and some other minor changes. So hopefully that behaviour (or an improved version of it) can end up in the Marketplace module in the future…

One point worth noting is that because of the way that Sitecore manages the values in the __sortorder field, if you put rules into folders, more than one rule can end up with the same sort order value. Sitecore restarts its numbering from scratch for each folder, so if the rules are fetched from more than one folder you can see duplication. And unsurprisingly, the outcome of the sort operation is undefined for these duplicates. If this is going to be an issue to you, you probably need to manually adjust the values in the sort order field, to make sure their values sort correctly. This field lives in the “Standard Fields” for Sitecore Items, so you’ll need to make sure those are visible in Content Editor in order to modify it. Or alternatively you might consider modifying the code to sort by a custom “rule priority” field that you create yourself.

Advertisements

One thought on “Getting your redirects right

  1. Pingback: Getting your queries in order | Jeremy Davis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s