Publishing restrictions and search

I had to deal with a bug report in some Sitecore 6.6 / Advanced Database Crawler search code recently, relating to items with publishing restrictions not disappearing from search results until another publish occurred. It struck me that there’s not much written about how publishing restrictions interact with search, so I figured I should take a bit of time to write down what I’d found while sorting the bug.

How publishing restrictions work

There’s a fair bit written about publishing restrictions in general, but the idea is that you can define a time and date range during which either the whole item, or a specific version of the item is valid for publishing. The settings are made via a dialog in Page Editor:

Restrictions

Clicking the “Change” button on the “Publish” ribbon opens the dialog. The editor can choose between the “version” and “item” tabs depending on which sort of restriction they wish to set. In both cases, the editor can choose if the item should ever be published (with the checkbox) or set a date/time range for publishing with the date and time selectors.

How this interacts with search

Lucene indexes items as a result of events which update the Sitecore databases. This can be when you click “save” in the Content Editor (to index the Master version) or when a publish operation copies the data to the Web database (or potentially another publishing target).

If you set “do not publish” on an item via the Publishing Restrictions dialog then you will not get a version in the Web Database – hence you won’t get an index entry for that item. Also, items with future publishing dates appear to be ignored by the indexer.

The items that are added to an index can also be affected by the configuration for indexing – you may set up an index to receive items from only one database for example.

When indexing is applied, the restriction fields get indexed into the following Lucene fields:

  • Version, Pubishable From: __valid from
  • Version, Publishable To: __valid to
  • Item, Publishable From: __publish
  • Item, Publishable To: __unpublish

Another important thing to remember is that the default configuration of indexing does not index the times set in publishing restrictions. For example, if you set restrictions like so:

TimeRestrictions

the indexed data looks like:

TimeIndex

No times go into Lucene – so you can only filter search results on a “per day” basis. It’s also worth noting that if you don’t set a value for a publishing restriction then the value added to the index is “00010101” – which is 01/01/0001 in Lucene’s format.

Filtering results on Publishing Restriction

If you need code to filter out results that have passed their unpublish date, you can extra clauses to your search query. Something along these lines:

string _unpublishedField = "__unpublish";
string _validToField = "__valid to";
        
string _noDate = Lucene.Net.Documents.DateTools.DateToString(new DateTime(0001, 01, 01), Lucene.Net.Documents.DateTools.Resolution.DAY);
string _futureDate = Lucene.Net.Documents.DateTools.DateToString(new DateTime(2100, 01, 01), Lucene.Net.Documents.DateTools.Resolution.DAY);

public void AddPublishingRestrictionsTerm(this BooleanQuery query)
{
    string today = Lucene.Net.Documents.DateTools.DateToString(DateTime.Now, Lucene.Net.Documents.DateTools.Resolution.DAY);

    BooleanQuery clause = new BooleanQuery();

    // clause for __unpublish
    BooleanQuery unpubTerm = new BooleanQuery();
    unpubTerm.Add(new TermQuery(new Term(_unpublishedField, _noDate)), BooleanClause.Occur.SHOULD);
    unpubTerm.Add(new TermRangeQuery(_unpublishedField, today, _futureDate, false, true), BooleanClause.Occur.SHOULD);
    clause.Add(unpubTerm, BooleanClause.Occur.MUST);

    // clause for __valid to
    BooleanQuery validToTerm = new BooleanQuery();
    validToTerm.Add(new TermQuery(new Term(_validToField, _noDate)), BooleanClause.Occur.SHOULD);
    validToTerm.Add(new TermRangeQuery(_validToField, today, _futureDate, false, true), BooleanClause.Occur.SHOULD);
    clause.Add(validToTerm, BooleanClause.Occur.MUST);

    query.Add(clause, BooleanClause.Occur.MUST);
}

The _noDate and _futureDate fields declare two values that will be used for comparisons later. As mentioned before, Lucene stores “no date” as 01/01/01, and we can format that appropriately with the DateTools.DateToString() method. The future date is an arbitrary value in the far future.

In the AddPublishingRestrictionsTerm() method, we add two new clauses to the search. Both must evaluate as true for a result to be returned. Both clauses follow the same code pattern, but refer to different index fields. To cover both Item and Version expiries, we need to look at both the __unpublish and __valid to fields.

For each of these we test two things. Firstly, is the value of the field equal to the “no date” value. Secondly, is the value of the field between today and our “future” date. If either of these is true this field is valid for display.

Applying these clauses should mean that once an item or version’s publishing restrictions expire, it will no longer be included in search results.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s