Tripping over Liskov Substitution and search

When you’re working with a “provider” model for services in your applications you get used to the assumption that everything follows the Liskov Substitution Principle and whatever provider you plug in will work in the same way. Unfortunately, for software our in the real world that’s not always entirely true. Recently I came across an example of this which helped point out a bug in some search code in Sitecore…

The scenario

A component I found myself looking at was using the ContentSearch APIs to perform some queries and then render UI based on the results. There wasn’t anything special going on. It was just finding an appropriate index, building up a query, running it and then displaying how many items matched. The relevant bit was vaguely along the lines of:

var index = fetchContextIndex(someContentItem);
var predicate = buildTheSearchCriteria(currentState);

using (IProviderSearchContext context = index.CreateSearchContext())
{
    var query = context
        .GetQueryable<SearchResultItem>()
        .Filter(predicate);

    var fullResultsSet = query.GetResults();
    var totalResults = fullResultsSet.Count();

    // Display the number of matches
}

The confusion

The code started off running against an index managed by Lucene. With the particular set of content on the server, the value of the variable totalResults came back as 97. That seemed a sensible value, as there were roughly that number of items that matched the search criteria. But later the code got migrated to a server that was using Coveo to index the same content. And once that had happened, the value of totalResults always came back as 10, despite there being more matching pages in both the content tree and in the Coveo index.

Cue some head scratching

The solution

After a bit of fun with Google and poking about with the debugger, the subtle issue revealed itself: The code above uses the fullResultsSet.Count() method to fetch the total number of index hits that the search framework found for the query. At first glance that looks fine – the fullResultsSet object exposes the IEnumerable interface – so calling Count() seems a perfectly reasonable way to get the size of the results when there’s no pagination involved in the query.

But as some of you no doubt already spotted, that’s not the documented way you’re supposed to get the total number of results for a query. As a number of Google hits point out, the property TotalSearchResults is the thing we should be using here. And that returns the correct value for both Coveo and Lucene.

If the query had included pagination, the issue would have revealed itself straight away, as that would have highlighted the different behaviours of Count() and TotalSearchResults when your query result set is bigger than the results page size. But because the code in question didn’t do that, the bug slipped through…

Why does it behave like this?

Well getting past the initial slightly petulant “just to confuse us!” response, it’s all down to implementation details…

If you look into the code for the SearchResults<TSource> you’ll see that this class exposing both the property TotalSearchResults and an IEnumerable:

Search Results Code

The code for the TotalSearchResults property is set specifically by the provider generating the results:

public int TotalSearchResults
{
	get;
	private set;
}

That value is set by the constructor, and it can be independent of the size of results page being returned for this query.

But the value of a call to Count() for this collection will be based on the enumerator that the class exposes. The implementation of IEnumerable returns an enumeration taken from the inner Hits collection:

IEnumerator<SearchHit<TSource>> IEnumerable<SearchHit<TSource>>.GetEnumerator()
{
	return this.Hits.GetEnumerator();
}

For Lucene, a query with no pagination will return all the index items matched up to the maximum defined in the config setting for “max result set size” (The ContentSearch.SearchMaxResults setting in your config files). In this case, that was more than 97 so the whole result set was returned and hence it looked like the code was working. But Coveo seems to default to a page of 10 results if you fail to specify pagination. If you think about it, that behaviour makes some sense. Lucene is running in the same process as your site, so it’s not a big issue for it to return all the result data if you don’t explicitly apply a pagination clause to your query. (You still should though!) It’s just shuffling memory about, which is fairly fast to do. However Coveo runs out-of-process (and in the worst case might be out in the cloud if you use the SAAS version) so defaulting to only returning details for the first 10 results if there is no pagination clause could help prevent performance issues from huge result sets being pushed across the network.

So take care people – Barbara Liskov might not approve, but sometimes you need to be wary about swapping out providers. There can be justifications for why behaviour isn’t always exactly the same, and those variations can lead to subtle bugs if you’re not paying attention…

And reading the documentation so you understand the right way to use the objects in question helps too 😉