Even your 404s can be dangerous…

Every website needs to be able to deal with requests for pages which don’t exist – some sort of 404 handling is a feature of pretty much every Sitecore project. But, as I discovered when sorting out an issue on a client’s site recently, it’s a bit of code which can bite if you’re not careful…

The scenario…

Out of the box, Sitecore’s default approach to handling page-not-found errors is to redirect the user to an error page. While this works fine, SEO requirements mean that it’s common for projects to need the errors served on the same URL that the user originally requested, but using a bit of “error page” content from a different Sitecore item. (Returning a 404 response directly rather than a 302 redirection to an error page is considered better) It’s pretty easy to implement this sort of response to not-found errors in Sitecore using pipeline processors. The code for the pipeline component I was looking at was doing the following:

  1. Detect the situation where Item resolution has failed for the current request.
    Basically, after the “item resolver” pipeline component(s) have run, is the context item still null?
  2. If so, make a web request for the configured 404 item’s page to retrieve its content.
  3. Set the response code for the current request to 404.
  4. Send the 404 page content back to the client as the response data for the current request.
  5. End the current request.

The code for this had appeared to be working fine for some time, but it turns out there’s a scenario where it can all go a bit pear shaped. Well, very pear shaped in fact. I found myself investigating the code because servers in the production cluster had an incident where they were ceasing to respond to any new requests, and the IIS logs were being flooded with requests for the website’s 404 page.

After a bit of digging, it turned out that the following was happening:

  • A request was arriving on a host name that had no <site/> explicitly bound to it.
  • The request was being handled by a “catch all” <site/> whose content tree did not include the content-managed 404 page item.
  • The 404 pipeline was triggering and making its request for the 404 page on the same un-mapped domain.
  • Hence the request for the 404 page was itself generating a 404 and triggering the pipeline component.
  • And once this starts, the server gets itself stuck in a loop, becoming unresponsive to external requests…

Not good, huh?

Things to think about…

So, if you’re going to write code for custom 404 handling, it would be to wise to think about a few things:

  • Make sure your custom handler is very careful about the situation where the request causing a 404 is for the page configured as the site’s 404 page!
    The pipeline component should test whether the URL that caused the 404 it’s handling includes the URL of the configured 404 page itself. If that scenario occurs then something’s gone wrong with the site’s configuration, and the code should fall back to a more simplistic handler – maybe Sitecore’s default approach, or maybe a static file.
  • Pay attention to the <site/> context of the requests the 404 code is processing.
    You should ensure your custom 404 code only triggers for requests that have been mapped to sites which the pipeline processor is appropriate for. (And as a side issue of this. you should also think about whether the pipeline code should respond to things like media requests)
  • You may need to be careful about the user’s cookies.
    If the 404 content that’s being requested internally has things like “login/logout” links on it (or perhaps personalisation) then you must make sure that you are passing the current user’s cookies when fetching its markup. The commonly used Sitecore helper function for making web requests WebUtil.ExecuteWebPage() does not seem to include code to handle this. When cookies aren’t passed, users can be confused when their current state isn’t reflected by the 404 content.
  • You should consider whether you need a to add a simple, non-content-managed 404 page.
    If the requirements for the work demand a content-managed 404 page, it may be wise to have a (simplified) static 404 page to deal with the scenario where your custom pipeline can’t find or use the content-managed one.

That way, you’re less likely to find yourself with an accidentally crashed site… 😉

In reality, it’s probably best to avoid writing entirely fresh code for requirements like this. There are lots of interesting examples that you can use for a starting point in your work. While writing up this post, a search on http://sitecore.link/ (give that site a try – it’s a searchable index of loads of useful Sitecore information) pointed me to this helpful post from Mike Reynolds titled “Yet Another <httpRequestBegin> Pipeline Processor to Handle “Page Not Found” (404 Status Code) in Sitecore“. That gives example code for most of the stuff mentioned above and is a good starting point for customisation specific to your site…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s