v10.1’s new database update strategy

One of the interesting things that’s arrived with Sitecore v10.1 is a new approach to how items get updated when you change versions. This change is aimed at containerised deployments, and I’m in the middle of a containerised project. So I figured I should take a look…

What’s new?

If you fire up a copy of Sitecore v10.1 and look in its SQL content databases, you won’t see quite what you’re used to. Try connecting SQL Management Studio to the master database and selecting the item rows with a query like this:

SELECT * FROM [Sitecore.Master].[dbo].[Items]

You’ll see a surprisingly small result set:

Two rows? What happened to all the items records? Because when you look in the content tree, it all looks pretty normal:

There’s certainly more than two items there, so what’s going on? How come there appear to be items in the content tree, when the database looks empty?

So what’s going on?

What’s you’re seeing here is actually a clever approach Sitecore are taking to improving the process for upgrades. And it may also make packaging your custom modules into containerised projects in the future. The reason the database looks empty is because we now have a new “hybrid” data provider for our databases. When you look at Content Editor, you’re actually seeing a merge of two things: the rows that actually exist in the database, and serialised items which are read from data files on disk.

In a vanilla copy of Sitecore v10.1 you’ll find the following new folders and files in the website:

And if you look into the config for this release, you’ll see that your database definitions have changed a bit. For example, the master database’s configuration now includes this:

<database id="master" singleInstance="true" type="Sitecore.Data.DefaultDatabase, Sitecore.Kernel" role:require="Standalone or Reporting or Processing or ContentManagement">
  <param desc="name">$(id)</param>
  ...
  <dataProviders hint="list:AddDataProvider">
    <dataProvider type="Sitecore.Data.DataProviders.CompositeDataProvider, Sitecore.Kernel">
      <param desc="readOnlyDataProviders" hint="list">
        <protobufItems type="Sitecore.Data.DataProviders.ReadOnly.Protobuf.ProtobufDataProvider, Sitecore.Kernel">
          <filePaths hint="list">
            <filePath>$(dataFolder)/items/$(id)</filePath>
          </filePaths>
        </protobufItems>
      </param>
      ...
  </dataProviders>
  ...
</database>

The database has a new data provider type – CompositeDataProvider. It’s job is to provide a merge between what’s in those disk files and what’s in the database. You can see that under the <dataProvider/> element that there is some xml describing where to look for the files, based on the id of the database.

The key reason for a merge operation is that if you edit an item that’s stored on disk, the resulting changes will get saved in the database. And from then on it’s the database item that gets presented back to you in the UI. So the database takes priority. As an example, if I got to /sitecore/content and click the “unprotect” button in the UI, that item gets updated, and hence appears in the database because of this change:

Edited to add: As brought up in the comments, the other side-effect of this model is that you cannot delete items which exist in resource files. You get (an amusingly typo’d) error dialog saying “Some of selected items are in resources and cannot be deleted”.

Peeking under the hood

The files on disk are binary-serialised item data. As implied by the config above the data is serialised using Google’s ProtoBuf specification – in this case using the ProtoBuf.Net library for .Net.

In fact, you can have a peek inside these files if you fancy. As a very simple experiment, I tried extracting the tree-structure of what’s inside the “web” file. Starting from a .Net v4.8 Console App project, I added the nuget package for Sitecore.Kernel v10.1 – that gives you both the ProtoBuf.Net library and some internal types for representing the data.

The code starts by taking the path of one of the binary files, opening it as a simple stream and then running the ProtoBuf deserialiser over it. The top level type for this data is the new Sitecore.Data.DataProviders.ReadOnly.Protobuf.Data.ItemsData class, which is specified as the type parameter to the deserialiser. And once we’ve got that data, we can look for an item who’s parent is the empty guid. That will be the root of the tree, and we can start processing the data for display from there:

public void Process(string filePath)
{
    using (var fs = File.OpenRead(filePath))
    {
        _data = Serializer.Deserialize<ItemsData>(fs);

        var root = _data.Definitions.Where(i => i.Value.ParentID == Guid.Empty).Select(i => i.Value).FirstOrDefault();

        Process(root);
    }
}

There are a few fields in that ItemsData class, and you can see it’s decorated with serialisation attributes to control how the ProtoBuf encoding works:

[ProtoContract]
public class ItemsData
{
    [ProtoMember(1)]
    public ItemDefinitions Definitions { get; set; }
    [ProtoMember(2)]
    public ItemsSharedData SharedData { get; set; }
    [ProtoMember(3)]
    public ItemsLanguagesData LanguageData { get; set; }
}

And from there you can recursively look down the tree by following the ParentID pointers:

public void Process(ItemRecord item, int indent = 0)
{
    var space = new string(' ', indent);
    Console.WriteLine(space + item.Name);

    var children = _data.Definitions.Where(i => i.Value.ParentID == item.ID).Select(i => i.Value);
    foreach(var child in children)
    {
        Process(child, indent + 1);
    }
}

For each item, print it’s name with the right number of leading spaces for the current indent level, and then process each of it’s children. If we wrap those methods up in a class and run that against one of the disk files:

var v = new Viewer();
v.Process(@"D:\myfolder\app_data\items\web\items.web.dat");

Then we see something that looks like the content tree:

And with some more effort (and decoding the logic via ILSpy) we could extract item versions and field values from this data too:

So how does this help us

In the past whenever we did an upgrade between Sitecore versions there would be a “.update” file we needed to install. It would apply item changes to the Sitecore content tree, and disk file changes. (plus some, but often not all) of the required content changes. That was an ok (if sometimes frustrating) approach when we were dealing with traditional deployments to VMs or PaaS App Services.

But in the world of containers we no longer need a package to give us the code and config changes for new Sitecore versions – those all come wrapped up into Sitecore’s Docker Images. And we layer our custom code and config on top of these for release. But in container world, we do still need to do something else to apply the database changes.

That could be a “job” image. When you run these images they do a task and then stop – so you could put the database upgrade tasks into an image like this and let it run once against your SQL databases. That’s pretty much the approach that is provided for initial setup of Solr and your databases with V10.0. So you could use this tactic to do upgrades too.

But Sitecore are trying a different tactic for the upgrade process now. Rather than having to make arrangements to run a job image once for each upgrade, they’ve moved these items into disk files which are automatically changed when you deploy the new Sitecore images. So any upgrade-related changes are magically merged into your content tree just by changing to the newer CM/CD images.

And there’s the potential for another advantage here. The logic that loads up the data from the disk files has a “merge” operation built in. So you could have multiple files in your “master” or “web” folders which all get loaded up on start-up. And that makes for an interesting approach to delivering modules. Historically we’ve always had to install packages (or deserialise items) to push the content items for our modules into a deployment. But if we serialise them into ProtoBuf files the item data could be merged into our Docker images without needing to touch databases. That fits really well with the patterns for building custom images for deployments – you could get a “Sitecore PowerShell” image which contains the code, config and ProtoBuf data to add the software to a basic Sitecore image, and your custom image build just has to use that image as a source and copy in the files.

So this approach looks like it could help us a lot in the future, and I find myself wondering if it might be a replacement for the TDS / Unicorn aspects of client builds eventually…

Edited to add: On Twitter, Kamruz Jaman points out something I’d entirely missed from the description above: Since we suddenly have many fewer rows in our content databases, publishing operations are much faster. There’s no need to ever publish an item that’s in the ProtoBuf data – because that’s on both your CM and CD boxes. So publish only has to read the database rows that are left. Hence a “republish all” operation now has vastly less data to process – and is going to finish much faster.

3 thoughts on “v10.1’s new database update strategy

  1. It sounds like an interesting feature. However a few questions springs to mind. First of all if what happens if we want to remove some of these items? Since they are stored on disk how can we delete them? You say that if the items are changed they will be moved to the database and take presidence but as far as I know deleted items are removed from the database which would lead right back to the items being read from disk. Second question would be how will this affect startup times in the future if Sitecore starts placing a large amount of items in the files – especially since this looks like content that any respectable developer would delete first before even starting to build a new solution. Lastly you mention that this could be used to distribute modules in the future. However since local changes will move the items to the database and prevent the files from being read from disk again, how would it br possible to ensure that changes to items are “pushed” to the solution and forced to override local obsolete items? Here an installation package still has the ability to override local items. I do think that this is a neat feature but I am sceptical as to whether Sitecore really planned for this to be used by external developers.

    • Deleting is an interesting edge case Bo – these are “Sitecore owned” items, so I think for the most part we wouldn’t want to delete them. But I guess there may be occasions when there is something in this set of data we might choose to remove. In that scenario, the answer is I don’t really know what happens – I’ll have to give it a try when I have some spare time…

      In terms of start-up, I’ve not measured this specifically. But we’ve swapped “reading database rows” for “finding things in memory” – so I doubt it’s going to get slower. But there’s probably a memory trade-off for this.

      For future deployment – I’m thinking of this being used for “developer owned” items. Rendering definitions, data templates etc. Thinks I would not expect to be edited on QA or Production. So this shouldn’t really be an issue? But I suspect we’ll end up with a PowerShell command or a UI button for “ensure this item reverts back to the ProtoBuf version” if it turns out this is an issue?

      • I tried deleting an item not in the db – you get an error: “Some of selected items are in resources and cannot be deleted”. (Their typo…) So you’re stuck if you find yourself in that situation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.