Thursday, January 15, 2009

Search Engine Friendly Error Handling

ASP.NET provides a standard way to handle errors in web applications by configuring a customError section in the Web.config file. Standard behaviour is to redirect a user from an erroneous page to an error page that in its turn shows some kind of more or less friendly explanation of what has happened.

This approach is quite OK for intranet web applications but usually is not appropriate for a public web app accessible for search engines.

The main problem here is that when error happens ASP.NET returns 302 "temporary redirect" response redirecting browser to the error page configured in the Web.config.

If the erroneous page was accessed by a SE bot the bot would index a content of the error handling page under the original page's URL thus creating a wrong index entry.

Another problem is if the actual error on the page was that the requested content was not found. This scenario is quite regular on modern dynamic web sites that construct content pages dynamically based on a Url. In http world such a situation should be called as 404 "Not found" especially in the case of a bot visiting such a page. But ASP.NET standard handling will once again respond 302 and let the bot include a not existing page in a search index.

What I am driving at is the correct error handling should always return a corresponding http status code for a SE bot with the appropriate content for the user. So how can we do that without writing too much code?

Actually quite easy. Starting version 3.5 SP1 ASP.NET has a new attribute redirectMode in the customError configuration section:



The new redirectMode attribute can be assigned one of two values: ResponseRedirect (default) or ResponseRewrite. ResponseRewrite prevents the erroneous page being redirected with the code 302 to an error page however the content of the error page will be shown on the original page. Internally ASP.NET is doing Server.Execute of the error page instead of Response.Redirect as usual.

So this is already a one third of work. The next thing we need to do is to return a correct http status code. It can be achieved by adding a few lines of code to the error page let's say to a Page_Load event handler:

       int httpCode = 500;
       Exception ex = Server.GetLastError();
       if (ex is HttpException)
       {
           httpCode = ((HttpException) ex).GetHttpCode();
       }
       Response.StatusCode = httpCode;


This code checks if the reason of the error was an HttpException (a standard .NET exception class) then it assigns an http code from the exception otherwise it returns 500. Now it's two third of work done.

What left is to handle exceptions in your code properly. The best practices here are:
  • If your application can not return a requested content for any reason except your application's internal problems then throw an HttpException with the code 404 and it will be friendly handled by your error page.

    throw new HttpException(404, "Not found");

  • Map other your application's specific exceptions to standard  http codes and return them too.
  • If your application throws an exception internally then wrap the internal exception with HttpException using an appropriate http code.

    try
    {
    ...
    }
    catch (Exception ex)
    {
    throw new HttpException(code, message, ex);
    }
That's basically it. In conclusion just a few more notes.

If an erroneous page happened to start rendering content before the error occurred you may want to clear it on the error page before outputting an error message:

Response.Clear();

If for some reason you can not use the new redirectMode attribute in the .config file (older framework, application specifics, etc.) then just add a few lines of code to the global.asax that do the same:

void Application_Error(object sender, EventArgs e)
{
// Do something with the error, i.e. log, notify, etc. 
Server.Transfer(errorpage);
}

It may be a good idea to clear an error status on the error page after you're done with error handling:

ClearError();

Now that is all.

7 comments:

  1. Great article!
    I was thinking about throwing 404s when some content is not found for some time... Now you convinced me!

    ReplyDelete
  2. Nice, one minor niggle:
    Exception ex = Server.GetLastError();
    if (ex is HttpException)
    {
    httpCode = ((HttpException) ex).GetHttpCode();
    }

    Should be written using as instead:

    int httpCode = 500;
    Exception hex = Server.GetLastError() as HttpException;
    if (hex != null)
    {
    httpCode = hex.GetHttpCode();
    }
    Response.StatusCode = httpCode;

    ReplyDelete
  3. To IDisposable:

    If you use "as" the casting operation potentially may return null without raising an exception. In this case one should always check for null before accessing an object's member. (Even Visual Studio will warn you about potential null value.)

    In my code on the contrary I use "is" first to check whether the Exception object is in fact an instance of HttpException class. Therefore direct casting is totally safe and does not require additional checks.

    ReplyDelete
  4. Hi!

    I tried the technique described but having some trouble getting it to work.

    My web.config has:

    customerrors mode="RemoteOnly" defaultredirect="~/DefaultErrorPage.aspx" redirectmode="ResponseRewrite"

    Then in my ErrorPage's Page_Load() I have:

    If Not Server.GetLastError() Is Nothing Then
    Dim ex As Exception = Server.GetLastError().GetBaseException
    If Not ex Is Nothing Then
    'Send email to webmaster, save in log table
    End If
    End If

    With redirectMode="ResponseRewrite" the custom error page (DefaultErrorPage.aspx) is not displayed... but if I replace the aspx by a static html then it works.

    Additionaly, if I use redirectMode="ResponseRedirect" instead, the custom error page (DefaultErrorPage.aspx) is displayed, but in this case I cannot get access to the error/exception details (Server.GetLastError() is null).

    Perhaps this is one of the cases, as you mentioned, where we cannot "use the new redirectMode attribute in the .config file (older framework, application specifics, etc.)" and must use Application_Error in global.asax instead.

    Any thought on this? Thanks,

    Ricardo.

    ReplyDelete
  5. Ricardo, I've seen the same behavior too. I've been trying to figure out the reason and even tried to debug .NET code but couldn't get to the problem. So the only solution I can suggest in your case is to not use ASP.NET custom error handling at all. Instead configure your own custom error handling in global.asax. I should work at least it works for me.

    ReplyDelete
  6. Good article! Was thinking about blogging on this myself, but looks like you cover it pretty well ;)

    ReplyDelete
  7. Hello Sir,

    I use that Piece of code in my web.config file code is

    for e.g i run
    http://www.mysite.com/contact.aspx to http://www.mysite.com/con.aspx

    it works fine but same i run http://www.mysite.com/contact.html it genrate the 403 error status code instead of 404

    Please help me regarding that

    ReplyDelete