Using DMS Profile Cards as search metadata

Some time back I worked on a website which made extensive use of two bits of technology:

  • Faceted search UI – to help website users filter content down to the things that interested them
  • (More on this topic in future posts)

  • DMS Profile Cards – to help editors personalise the website to match user’s interests

When the project was originally specified, these two things were thought of as individual aspects of the project and not much thought was put into the idea of bringing them together. But as the project progressed one of my colleagues realised that editors were basically being asked to enter the same data twice in some areas of the site. Once when they configured the metadata to drive the search facets, and once when they set up profile cards. We’d failed to spot the strong overlap between the data being entered in these two areas of the site.

As an experiment in “can it be done?” I tried to see if it was possible to index your DMS Profile Card data in Lucene, to allow it to be used as search facets. This never ended up in the actual project, but I thought it might be of interest in case anyone else finds themselves with a similar need.

Here’s what I learned:

When you set up a Profile card in the Marketing Center, you are assigning some preset values:

image

You can then go to a page and attach this card to it, or you can manually enter/override the values of a card directly into the Page Item. The fancy UI for managing the Profile Card data attached to an Item stores XML under the surface. If you assign one of your Cards to a page using the DMS UI, you’re putting XML into the “Tracking” field of the “Advanced” region of your page fields. Set Content Editor to show Standard Fields and Raw data and what you see is:

image

Looking at the XML, you find something like:

<tracking>
	<profile id="{24DFF2CF-B30A-4B75-8967-2FE3DED82271}" name="Focus">
		<key name="Practical" value="5"/>
		<key name="Process" value="8"/>
		<key name="Background" value="2"/>
		<key name="Scope" value="4"/>
	</profile>
	<profile id="{BA06B827-C6F2-4748-BD75-AA178B770E83}" name="Function">
		<key name="Building Trust" value="9"/>
		<key name="Create Desire" value="2"/>
		<key name="Define Concept" value="1"/>
		<key name="Call to Action" value="6"/>
	</profile>
</tracking>

Each card you attach has an profile element, and under that are a set of key elements which store the individual values. Those values are a number representing the strength (for want of a better word) of how well that key represents the current item. When you start from a pre-defined Profile Card, the presets attribute on your profile element tells you which of the pre-defined cards was used – you can find this in the “Profile Cards” folder underneath the Marketing Center Profile Item with the ID from the id attribute.

So the Profile Card Keys do look a bit like facet metadata – they just have a strength-of-attachment value which we wouldn’t have in normal facet metadata.

For the purposes of the quick test I conducted, I decided to use the idea that a Profile Card Key is a valid facet for an Item if its value exceeds a pre-defined value. So a key with value zero might be ignored, a key with value 2 might represent a valid bit of metadata. While I’ve defined this test with a static value in the example below, a real implementation of this code would make it more configurable – either as a global value that can be edited, or perhaps configured per Profile Card or similar. But with that in place, when we do our search indexing we can compare each Key with this value, and only add the ones which pass to Lucene’s index.

Next step then is how do we get this data into our search index?

The project I was working on when I investigated this was using Sitecore 6.5. Before Sitecore 7.0, the easiest way to approach custom search configuration was to make use of the Advanced Database Crawler opens source project. Grab a copy of this, build it against your particular version of Sitecore and then drop its binaries into your website’s bin folder. For a real solution you’d create your own custom search configuration to meet your project’s needs, but for the purposes of this example tou can just add the scSearchContrib.Crawler.config.example file (renamed of course) to your App_Config/Include folder to get a new index called “demo”.

(NB: If you try to replicate this under Sitecore 7 you will find you no longer need the Advanced Database Crawler, as most of that work has been absorbed into the main Sitecore codebase. However you will also find some classes have changed names, so the example code and config below may need some adjustment to work – I’ve not tried this though)

What we need to do here is to perform some computation on the raw field data before it is handed to Lucene. We can configure the search indexer to run our code when it encounters the Profile Card data field on an item fairly simply.

Firs of all, we need the base of our custom search indexing code. This needs to inherit from FieldCrawlerBase, which is defined by the Advanced Database Crawler:

public class TrackingFieldCrawler : FieldCrawlerBase
{
    public TrackingFieldCrawler(Field field) : base(field)
    {
    }

    public override string GetValue()
    {
    }
}

We’ll come back to the details of what this has to do in a minute – but first we need to patch this into the configuration so that it gets called. You need to add two bits of configuration to your indexing config. First find the fieldCrawlers element in the search config file, and add this element to its children:

<fieldCrawler type="YourNamespace.TrackingFieldCrawler,YourBinary" fieldType="Tracking" />

But remember to fill in the .Net type descriptor there to point to the real namespace and binary that hold your instance of TrackingFieldCrawler. This tells the indexer “when you see a field that matches the type ‘Tracking’, pass it to the TrackingFieldCrawler class” so that our custom code gets run over the Profile Card data. Next you need to find the fieldTypes element and add the following as a child of it:

<fieldType name="Tracking" storageType="NO" indexType="TOKENIZED" vectorType="NO" boost="1f"/>

This tells Lucene how to treat the data our custom class generates. You may wish to modify this a bit to match your own solution, but for the purppses of this example we’re saying:

  • boost: This field is no more important than any other field.
  • vectorType: Don’t store term vectors – we don’t need to be able to find out where in the data a specific term was found.
  • indexType: Break up the string into words to index it.
  • storageType: When building the index, store the raw data as well, so we can get it back later. (This is relevant to a future blog post about building faceted search UI in Sitecore 6.x)
  • name: What the field is called in the Sitecore Template that defines it.

With the index config in place, we can now look at what to do in the code. And it’s fairly simple:

public override string GetValue()
{
    if (string.IsNullOrWhiteSpace(this._field.Value))
    {
        return string.Empty;
    }

    StringBuilder sb = new StringBuilder();

    XElement tracking = XElement.Parse(this._field.Value);

    foreach (XElement profile in tracking.Elements("profile"))
    {
        sb.Append(profile.Attribute("name").Value);
        sb.Append(" ");
        foreach (XElement key in profile.Elements("key"))
        {
            int val = 0;
            int.TryParse(key.Attribute("value").Value, out val);
            if (val > 1)
            {
                sb.Append(key.Attribute("name").Value);
            }
        }
    }

    return sb.ToString();
}

The GetValue() method is called every time the Advanced Database Crawler finds the Tracking field we configured above. If the field we were supplied is empty, then there are no Profile Cards to process and we can return an empty string.

Otherwise we parse the XML stored in the field, and iterate through it. For each profile element we find, check its keys. If the value of the key is greater than some pre-defined value (I use 1 here as a shortcut to keep the code simple – it would be configurable in the real solution) we add the name of the key to a string.

When it finishes, the code returns the string we generated: A space-separated list of the different keys that are attached to this Item. (This is not necessarily the best format to use – just an easy one to demonstrate that the concept works) Lucene then breaks that into tokens, and stores it in the index ready for querying.

And now you can write a Lucene Term Query to match any Item which contains a specific DMS Profile Card Key. This will work with ordinary text searches, but it can really come into its own when used with a faceted search UI – a topic I plan to come back to in future posts.


PS: Having posted this, Martin Davies on twitter pointed me at a blog post discussing the use of a similar approach with Sitecore v7. If this post was interesting, you may wish to compare and contrast with: Using DMS in your Sitecore 7 Search by Ian Graham.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s