Sorting for search, when you’re living in the dark ages

I’ve written before about filtering data in Lucene searches if you’re still using Sitecore 6.x. Having been doing more legacy work on this front over the last couple of weeks, I’ve got a couple of new things to add. Previously, the search work I’d been doing had relied on the default “relevance” sort order, or LINQ OrderBy clauses. However recently I’ve needed to enable some more complicated sorting, which has lead me to a few new (to me, at least) discoveries.

Doing sorts using LINQ clauses isn’t very efficient. Generally you have to get all the results to do a valid sort. Surely it’s possible to get Lucene to do this job for us before we go to the trouble of fetching all the results? Initially that doesn’t look easy. How do you pass a sort order into your query? The IndexSearchContext class that you use to execute a search has lots of methods, but none mention sorting:

public class IndexSearchContext : IndexContextBase, IDisposable
{
    protected IndexSearchContext();
    public IndexSearchContext(ILuceneIndex index);

    public IndexSearcher Searcher { get; }

    protected BooleanQuery CreateBooleanQuery(BooleanQuery prototype, params object[] args);
    protected BooleanQuery CreateBooleanQuery(bool disableCoord, float boost, params object[] args);
    protected PrefixQuery CreatePrefixQuery(TermQuery termQuery);
    protected PrefixQuery CreatePrefixQuery(string name, string value, float boost);
    protected TermQuery CreateTermQuery(string name, string value, float boost);
    public void Dispose();
    public static string Escape(string query);
    public Explanation Explain(PreparedQuery query, int doc);
    protected void Initialize(ILuceneIndex index, bool close);
    protected Query InternalParse(string query);
    protected Query InternalParse(string query, string defaultField);
    public PreparedQuery Parse(string query);
    public PreparedQuery Parse(string query, ISearchContext context);
    public PreparedQuery Prepare(QueryBase query);
    protected PreparedQuery Prepare(Query query, ISearchContext context);
    public PreparedQuery Prepare(QueryBase query, ISearchContext context);
    [Obsolete("Use Search(PrepareQuery query, int n)")]
    public SearchHits Search(PreparedQuery query);
    [Obsolete("Deprecated. Use Search(Query query, int n)")]
    public SearchHits Search(Query query);
    [Obsolete("Deprecated. Use Search(QueryBase query, int n)")]
    public SearchHits Search(QueryBase query);
    [Obsolete("Use Search(string query, int n)")]
    public SearchHits Search(string query);
    public SearchHits Search(PreparedQuery query, int n);
    public SearchHits Search(Query query, int n);
    [Obsolete("Deprecated. Use Search(Query query, int n, ISearchContext context")]
    public SearchHits Search(Query query, ISearchContext context);
    public SearchHits Search(QueryBase query, int n);
    [Obsolete("Deprecated. Use Search(QueryBase query, int n, ISearchContext context)")]
    public SearchHits Search(QueryBase query, ISearchContext context);
    public SearchHits Search(string query, int n);
    [Obsolete("Use Search(string query, int n, ISearchContext context)")]
    public SearchHits Search(string query, ISearchContext context);
    public SearchHits Search(Query query, int n, ISearchContext context);
    public SearchHits Search(QueryBase query, int n, ISearchContext context);
    public SearchHits Search(string query, int n, ISearchContext context);
    protected virtual Query Translate(QueryBase query);
}

A bit of messing about with Google looking for ideas, and I came across a post on the developer’s favourite website Stack Overflow suggesting a solution: You can overload the IndexSearchContext with method(s) which do pass a sort parameter on to Lucene:

public class SortableIndexSearchContext : IndexSearchContext
{
    public SortableIndexSearchContext(ILuceneIndex index)
    {
        Initialize(index, true);
    }

    public SearchHits Search(Query query, Sort sort)
    {
        return Search(query, SearchContext.Empty, sort);
    }
}

(See @techphoria414‘s answer in Stack Overflow above for other overloads that are possible)

So now we have a way to pass a sort, what is it that we need to pass in? A Lucene Sort object. That consists of one or more SortField instructions to sort based on a field name. So you can create sorts with something like:

Sort singleLevelText = new Sort(new SortField("indexfieldname", SortField.STRING, false));

Sort twoLevel = new Sort(new SortField[] {
    new SortField("firstfieldname", SortField.INT, true),
    new SortField("secondfieldname", SortField.STRING, false)
});

Each SortField needs the name of the index column, a data type and a boolean to indicate if the sort order should be reversed or not. While Lucene still provides a SortField.AUTO option where it tries to work out the type to sort itself, this is marked as obsolete. An accurate description of the data type should be provided.

So now you can write code to run a sorted search along the lines of:

BooleanQuery query = generateAQuery();
Sort sort = new Sort(new SortField("title", SortField.STRING, false));

var index = Sitecore.Search.SearchManager.GetIndex("myIndex");

using (var isc = new SortableIndexSearchContext(index))
{
    SearchHits hits = isc.Search(query, sort);

    // process the sorted results
}

No need to OrderBy() at all. And since the SortField class takes a text name for the field to sort by, it’s fairly easy to have this selected via the user interface if you want variable sorts.

But there’s one scenario where this doesn’t quite work as well as the OrderBy() approach: What happens if you need to sort your results by a column which contains a Sitecore ID? Somewhere in your data where the editors are filling in a MultiList or DropLink field, for example. GUID-like data isn’t really sortable because it’s supposed to be unique. What you really want to sort on is some aspect of the target item. That’s easy in LINQ queries because you can use the Sitecore APIs to look up the target item and fetch one of it’s field values. But Lucene doesn’t understand that – so what can you do?

Well one approach is to have computed entries in your index, so that Lucene does have the right data to sort by because it’s been calculated at index time. I’ve been using the Advanced Database Crawler project for all of my legacy development work, and it provides a helpful mechanism for dealing with this situation: Dynamic Fields.

You can create a class based on BaseDynamicField that loads the right value. It needs to implement the ResolveValue() method to transform the current item into the value you want indexed:

public class MyDynamicField : scSearchContrib.Crawler.DynamicFields.BaseDynamicField
{
    private static readonly ID correctTemplateID = new ID("{00000000-0000-0000-0000-000000000000}");
    private static readonly ID fieldID = new ID("{00000000-0000-0000-0000-000000000000}");

    public override string ResolveValue(Item item)
    {
        if(item.TemplateID == correctTemplateID)
        {
            string fieldValue = item.Fields[fieldID].Value;
            if(!string.IsNullOrWhiteSpace(fieldValue))
            {
                ID itemID;
                if(ID.TryParse(fieldValue, out itemID))
                {
                    Item otherItem = item.Database.GetItem(itemID);
                    if(otherItem != null)
                    {
                        return otherItem.DisplayName;
                    }
                }
            }
        }

        return string.Empty;
    }
}

The code starts off by checking if the current item being indexed has the correct template. Generally you only want to apply this sort of lookup to items based on specific templates – hence the test. Then the code extracts the value from the field we want to look up, checks it has a value and tries to parse it as a Sitecore ID. Assuming those tests pass we can try to load the item from the database.

Note that we use the database which the context item came from. The Sitecore.Context.Database at the point this code runs is most likely Core – which is not the one your content will be in. Using the database for the context item ensures we try to look up the other value from the right database. If the load succeeds then we can extract the appropriate value for indexing. I’ve used DisplayName here, but you could use any field that was relevant to your solution.

So the code will return either the looked-up value, or an empty string.

There’s a bit of configuration necessary to add this to the index – You have to tell the crawler to run your dynamic field. This is configured in the scSearchContrib.Crawler.config file. Inside the <dynamicFields> element you add a new value for your code:

<dynamicFields hint="raw:AddDynamicFields">
    <!-- other entries here -->
    <dynamicField type="MyNamespace.MyDynamicField,MyAssembly" name="newindexfield" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f" />
</dynamicFields>

The type attribute specifies the fully qualified name and assembly to load the dynamic field type from. The name attribute says what you want the index column to be named. And the remaining parameters configure how the field will be processed.

With that in place, and a full index re-build done, you can write sorts which use the dynamic field instead of the ID in the normal field – and hence can be correctly sorted.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s