Faceted Search in Sitecore 6.6

Last week I spoke at the London Sitecore Technical User Group, and discussed my experiences working on a project that had to provide a Faceted Search UI in Sitecore 6.6. As part of my example, I talked about how you can build Facets using Lucene when you don’t have access to the newer search APIs available in Sitecore 7.x, and about how you can make your search UI configurable by editors to improve their user experience. And I said I’d post my example code and explanation. So here goes:

Imagine you want to sell t-shirts from your Sitecore website. To help people find the t-shirt they want on your site, you want to provide a search interface where they can filter your products using aspects of your metadata – say Size and Colour. And you want to allow your content editors to control the Facet UI that lets people filter your products. How might you go about that?

Data templates

The first step in filtering t-shirts, is that you need a data template to represent a t-shirt. We can set up a simple template like so:

T-shirt Template

There are two sets of fields here. The Content Data region has the normal rich text and images for displaying the t-shirt. The Facet Metadata region has the metadata that we’ll use for the filtering. I’ve created two fields here to represent colour and size – and set them up as Multilists so that editors can pick the appropriate values. The values they can choose from are set up as items elsewhere in the content tree.

T-shirt Metadata

A series of values have been set up for size and colour so editors can pick from them, as well as a set of example t-shirt items for us to search over with the following code. And with that test data in place, we can think about how to search them.

If we want to allow editors to modify the set of facet filters available in a particular search, then we need a way to represent a facet inside Sitecore. Hence another data template:

Facet Template

For this simple example we need only one field here. It will contain the name of the Sitecore field that we want to build a facet from. With that template we can then set up example facet definition items for each of the Colour and Size metadata fields we defined earlier. For example, the “Colours Facet” item needs the value “Colour” in its “SearchIndexField” field.

By describing facets in this way you will be able to create new facets later without needing to change the code. In a real-world example you would have other fields on these items. Things like display text for the name of the facet, or data to describe the rendering or sorting strategies for this particular facet would be common examples.

With those set up, the next thing you need is a data template to describe the set of facets to use in a particular search:

Query Template

For our simple example we need only one field – a multilist to let editors pick from the set of facet definition items we defined above. We can create an example query definition item from this template, and select both our Colour and Size facet definitions. This “TShirtsQuery” item will be the data source of our UI control for the searching.

For this example, the UI control is a simple Sublayout, with an ASP.Net User Control that we’ll see code for later. The sublayout can be be bound to a page and have its data source set to the item above.

Search Index

In order to run a search, you need a Lucene index to query against. For Sitecore 6.6 projects a helpful shortcut to setting up an easy to configure index is to nip over to GitHub and grab yourself a copy of the Sitecore SearchContrib project. This includes a load of useful code for speeding up search work, but for the purposes of this demo, all we need is the scSearchContrib.Crawler part of the code. Grab this code, build it against your particular versions of Sitecore and Lucene. You can then drop the binaries into your website’s bin folder and grab yourself a copy of the example search configuration file that is included in this project and drop it into your website’s config include folder.

The example config is mostly fine for this example. You need to note that it creates an index named “demo”. But it requires one change to be made. You need to find the data that defines how the Multilist fields are indexed, and change the “StorageType” attribute to “Yes”:

Index

Why do you need to make this change? Well when the default configuration of Lucene here builds an index it sets up a collection of complex data structures that it uses for efficient searching. But while the index remembers the Sitecore IDs of items, it doesn’t remember any of the other data it’s been processing. For “normal” searches that’s fine – you just use the Sitecore APIs to load the items your search index matches in order to do further processing.

But here, building facets, we need to process lots of items that we won’t end up showing in our on-screen results in order to get all the facet values. (You’ll see this in the code later on) That would be very inefficient via the Sitecore APIs. Item loading is much slower than reading information from the Lucene index. So to get around that problem we change “StorageType” so that Lucene does remember the values of Multilist fields when it indexes them.

That lets us trade off a slightly larger index size against those extra Sitecore API calls for better performance.

With that config change made, you can build an index to search later…

Code

For the purposes of this simple example, the UI code is a normal ASP.Net User Control. (With the usual caveat of skipping over good patterns like error handling, Glass etc) However other approaches to building the UI are fine too – there’s nothing stopping you from building an equivalent to this using MVC UI or your favourite patterns.

The code for the UI control needs to do a series of simple operations for this to work. It needs to load the configuration, build a search query, run it, generate the data for our facets, render the facets and then render the results. Lets walk through those in turn:

Earlier I said that the configuration was going to be passed to the control via its data source. Hence we can load the configuration item like so:

private string[] loadConfig()
{
    string cfgItem = (this.Parent as Sublayout).DataSource;
    Item cfg = Sitecore.Context.Database.GetItem(cfgItem);

    string facets = cfg.Fields["Facets"].Value;

    string[] config = null;

    if (!string.IsNullOrEmpty(facets))
    {
        config = facets.Split('|')
            .Select(id => Sitecore.Context.Database.GetItem(id))
            .Select(itm => itm.Fields["SearchIndexField"].Value)
            .Select(f => f.ToLowerInvariant())
            .ToArray();
    }

    return config;
}

The data source item contains one field, so we extract its value. It’s a multilist, so the value is a string of GUIDs separated with pipe characters. So to transform this data into something more usable we can use a Linq expression to process it. (You don’t have to use Linq in your code – you can use whatever way of writing this you prefer – and if you used an ORM like Glass you’d find a lot of this method was done for you) We break it using the pipe character, and then project each of those IDs into Sitecore items via the API. Each of those items will be one of our Facet definitions – so we can extract the one field that contains from the item, leaving us with a set of field names we can build facets from. Finally these field names get transformed into lowercase, since Lucene works in lowercase most of the time.

And at the end of that we have an array containing the names of the fields we want to make facets from.

Next is building a query. To do that we need to construct a Lucene expression tree to represent the search we’re going to run. We start from a BooleanQuery object and we add a set of required terms to that:

private readonly Guid _template = new Guid("{C5537201-AF56-473B-A614-E5DA9FB5079E}");

private BooleanQuery buildQuery(string[] facets)
{
    BooleanQuery query = new BooleanQuery();

    query.Add(new TermQuery(new Term("_template", ShortID.Encode(_template).ToLowerInvariant())), BooleanClause.Occur.MUST);

    foreach (string facet in facets)
    {
        string key = Request.Form.AllKeys.Where(k => k.EndsWith("$" + facet)).FirstOrDefault();
        if (!string.IsNullOrWhiteSpace(key))
        {
            string value = Request.Form[key];
            if (!string.IsNullOrWhiteSpace(value))
            {
                query.Add(new TermQuery(new Term(facet, value)), BooleanClause.Occur.MUST);
            }
        }
    }

    return query;
}

The first term(s) to add are those which will restrict the overall set of results we’re working on. Here we’ve added a search term to make sure we only ever return items whose template is our T-shirt template. In real code you’d probably add things like language or path terms here as well.

The template’s ID is defined in the code. Note how we encode the ID of the template as a ShortID to pass it to Lucene. Lucene represents IDs as ShortIDs (basically a GUID with the hyphens and braces removed) so we need to make sure we pass them in that format (and in lowercase) to make sure we get the matches we want.

The rest of this method iterates our configuration array and considers whether to add a term for each of the facets defined.

When we run this bit of code we have no UI controls defined to get selected values from. Hence we have to look in the HTML form’s postback data directly. Later on we’ll make sure the form elements have their IDs set to the names of our facets – so here we look to see if we can find an item whose name matches our current facet. If we find one, and if that ID has a value in the postback data then we add a term to our search. If we don’t find the name or a value, we skip over it as the user is not restricting results by this facet.

And once we’ve been through all of the config we have a search query to run. That can be done with the following method:

private SearchResultCollection runQuery(BooleanQuery query)
{
    Index index = Sitecore.Search.SearchManager.GetIndex("demo");

    using (IndexSearchContext isc = index.CreateSearchContext())
    {
        SearchHits hits = isc.Search(query, 10000);
        return hits.FetchResults(0, 10000);
    }
}

First we ask Sitecore to give us the index named “demo” (remember the name from the config file earlier?) and from that we create a context for searching. We use that context to run our query and give us back a set of “hits”. Note the second parameter to the Search() method – it’s a number for how many results Lucene will look at before it gives up the search. To generate the right set of facets you must ensure what ever you set this to is bigger than the maximum number of results you might have. If Lucene discards any matching results then your set of facets might be wrong….

You can then transform your hits into proper results. The two integer parameters of FetchResults() are usually used for paginating your results. However here we need all the results – so these parameters must also be set to return everything.

From the results returned we can now build the data for the facets we’ll display. To make the code a bit simpler we’ll define a quick helper data type here to contain the data for each facet:

public class FacetData
{
    public string Name { get; private set; }
    public IEnumerable<ListItem> Values { get; private set; }

    public FacetData(string name, IEnumerable<ListItem> values)
    {
        Name = name;
        Values = values;
    }
}

Each of our facets will have a name, and a list of items to display. We’re using .Net’s standard ListItem class to represent each facet value as we’ll bind these to a Dropdown list later.

So, to build our fact data we need to process the search results with:

private IEnumerable<FacetData> buildFilterData(string[] facets, SearchResultCollection results)
{
    List<FacetData> filters = new List<FacetData>();

    foreach (string facet in facets)
    {
        var data = results
                    .Select(r => r.Document.GetValues(facet))
                    .Where(v => v != null && v.Length > 0)
                    .SelectMany(v => v)
                    .Distinct()
                    .Select(v => Sitecore.Context.Database.GetItem(ShortID.Parse(v).ToID()))
                    .Select(v => new ListItem(v.DisplayName, ShortID.Encode(v.ID).ToLowerInvariant()))
                    .OrderBy(i => i.Text);

        FacetData t = new FacetData(facet, data);

        filters.Add(t);
    }

    return filters;
}

We iterate the set of facets in our configuration. For each one we use a Linq expression to process the search results into a set of facet data. For each Sitecore field name in our configuration we ask Lucene to give us back all the values it found in the results for that column. This call to GetValues() is the reason for the StorageType config change earlier. If you forget that update, then this method call will return nothing. But for us it’s going to return a collection of values. It’s a collection for each search result because Lucene gives you back the value of the multilist fields already split by the pipe character. So we discard any empty ones (The editor may not have applied any metadata to this field for this item) and we then call SelectMany(). That’s a Linq method that will flatten the IEnumerable<IEnumerable<string>> into the easier to process IEnumerable<string>. And then we can call Distinct() to ensure we have each ID only once.

Then we project the IDs into Items again - to get a set of the Size or Colour metadata items we defined earlier. And the we can project each of these into a ListItem with a sensible display name from the item, and the value set to the item's ID transformed into another lowercase ShortID. And finally we sort the items into alphabetical order.

With that data we can set up an instance of our FacetData with these values, and go around to process the next one.

And we end up with an IEnumerable<FacetData> that we can bind to our UI.

To display the facets we need some UI to bind the data to:

<h2>Filters</h2>
<asp:Repeater runat="server" ID="filterRepeater">
    <ItemTemplate>
        <div>
            <asp:DropDownList runat="server" ID="filter" AutoPostBack="true" />
        </div>
    </ItemTemplate>
</asp:Repeater>

Here we have a simple repeater to create a set of dropdown lists with auto-postback set to true. We set up a standard data binding for them, and for each item we run the following code to do the bind:

private void filterRepeater_ItemDataBound(object sender, RepeaterItemEventArgs e)
{
    if (e.Item.ItemType == ListItemType.Item || e.Item.ItemType == ListItemType.AlternatingItem)
    {
        FacetData data = e.Item.DataItem as FacetData;

        DropDownList ddl = e.Item.FindControl("filter") as DropDownList;
        ddl.ID = data.Name;

        var vals = new List<ListItem>();
        vals.Add(new ListItem("-- All " + data.Name + " --", string.Empty));
        vals.AddRange(data.Values);

        ddl.DataSource = vals;
        ddl.DataTextField = "Text";
        ddl.DataValueField = "Value";
        ddl.DataBind();
    }
}

Two things of note here. First is that when we find the DropDownList we set its ID to match the name of the current facet so we can find it in the query definition above. The second is that the data we generated for our facet includes all the real values, but id doesn't include a "don't filter by this" option. So we add a few lines to add this to the top of our list before we do the data bind.

Now next you'd probably want to paginate your results. I skipped over that in my demo to save time, but you could use Linq's Skip() and Take() operations, or you could refactor your code to be able to call the FetchResults() method on your search hits again instead.

And with your page of results sorted out, you can bind that to your UI. I created the following HTML in order display the items:

<h2>Results</h2>
<asp:Repeater runat="server" ID="resultRepeater">
    <HeaderTemplate><ul></HeaderTemplate>
    <ItemTemplate>
        <li>
            <sc:Image runat="server" id="image" MaxWidth="50" Field="Image" style="display:inline-block;" />
            <div style="display:inline-block">
                <sc:Text runat="server" id="title" Field="Title" />
                <br />
                <sc:Text runat="server" id="description" Field="Description" />
            </div>
        </li>
    </ItemTemplate>
    <FooterTemplate></ul></FooterTemplate>
</asp:Repeater>

And the code to do the binding is fairly simple:

private void resultRepeater_ItemDataBound(object sender, RepeaterItemEventArgs e)
{
    if (e.Item.ItemType == ListItemType.Item || e.Item.ItemType == ListItemType.AlternatingItem)
    {
        SearchResult sr = e.Item.DataItem as SearchResult;

        Item itm = sr.GetObject<Item>();

        var image = e.Item.FindControl("image") as Sitecore.Web.UI.WebControls.Image;
        var title = e.Item.FindControl("title") as Sitecore.Web.UI.WebControls.Text;
        var description = e.Item.FindControl("description") as Sitecore.Web.UI.WebControls.Text;

        image.Item = itm;
        title.Item = itm;
        description.Item = itm;
    }
}

Whilst earlier in the code we use the SortageType config to be able to get data back via Lucene rather than loading items, that approach doesn't work so well here. The data Lucene gives you back has been parsed and had some of the noise discarded. Hence can discard markup in your rich text - so it's better to load the items for the set of results you're going to show on the page.

So to wrap it up, the code that binds all these methods together is as follows:

protected void Page_Load(object sender, EventArgs e)
{
    string[] facets = loadConfig();

    BooleanQuery query = buildQuery(facets);

    qry.Text = query.ToString();

    SearchResultCollection results = runQuery(query);

    IEnumerable<FacetData> facetData = buildFilterData(facets, results);

    filterRepeater.DataSource = facetData;
    filterRepeater.ItemDataBound += filterRepeater_ItemDataBound;
    filterRepeater.DataBind();

    // paginate

    resultRepeater.DataSource = results;
    resultRepeater.ItemDataBound += resultRepeater_ItemDataBound;
    resultRepeater.DataBind();
}

And you can compile this code, and run it to get the configurable faceted UI:

UI

There are dropdowns for each facet defined in the configuration, and they contain all the valid metadata options for the current results. Hence no matter what you choose, you can't get zero results. Then we're displaying the Lucene query for debug purposes - and here it has a term for the template and for each of our facets. And finally we show only the results that the set of facets match...

Downloads

If you want to grab a copy of this to play with then you can download the Sitecore content items (Add your own UI component and page to display it all) and download the C# to build yourself, combine with the SearchContrib code described above and experiment on your own test instance of Sitecore.

Advertisements

2 thoughts on “Faceted Search in Sitecore 6.6

  1. Pingback: Updating faceted search with client-side code | Jeremy Davis
  2. Pingback: Sorting for search, when you’re living in the dark ages | Jeremy Davis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s