Fun with broken SQL Replication

Recently I’ve spent some time debugging some issues with a client site. One problem that came up during this work (which, sadly wasn’t actually the problem I was looking for) was a set of errors from SQL Replication appearing in the Sitecore logs. In keeping with my previous ranting about keeping your Sitecore logs as error free as possible I spent some time working out how to solve this.

Since I’d not looked at databases at this sort of level for some considerable time, I’m writing down some notes for my future self, next time I have to worry about this sort of thing…

The symptoms

The client in question had two database server clusters set up in separate data centres. For DR purposes, data was being pushed from #1 to #2 with a one-way replication. The replication had originally been configured for a subset of tables in Sitecore’s Core database. It should have been pushing data from the ASP.Net Membership tables, but not pushing data for things like the ClientData table.

But there were fairly regular exceptions appearing in the log, with epic stack traces like this:

3836 06:50:18 ERROR Application error.
Exception: System.Exception
Message: Table '[dbo].[ClientData]' into which you are trying to insert, update, or delete data is currently being upgraded or initialized for merge replication. On the publisher data modifications are disallowed until the upgrade completes and snapshot has successfully run. On subscriber data modifications are disallowed until the upgrade completes or the initial snapshot has been successfully applied and it has synchronized with the publisher.
The transaction ended in the trigger. The batch has been aborted.
Source: Sitecore.Kernel
   at Sitecore.Data.DataProviders.Sql.DataProviderCommand.ExecuteNonQuery()
   at Sitecore.Data.DataProviders.Sql.SqlDataApi.<>c__DisplayClass15.<Execute>b__14()
   at Sitecore.Data.DataProviders.NullRetryer.Execute[T](Func`1 action, Action recover)
   at Sitecore.Configuration.ClientDataStore.Touch(String key)
   at Sitecore.Configuration.ClientDataStore.Entry_EntryRemoved(Object sender, EntryRemovedEventArgs args)
   at Sitecore.MainUtil.RaiseEvent[T](EventHandler`1 subscribers, Object sender, T eventArgs)
   at Sitecore.Caching.Cache.Remove(Object key)
   at Sitecore.Caching.Cache.Add(Object key, CacheEntry entry)
   at Sitecore.Caching.Cache.Add(String key, Object data, Int64 dataLength, TimeSpan slidingExpiration)
   at Sitecore.Configuration.ClientDataStore.AddDataToCache(String key, Hashtable data, Int64 size)
   at Sitecore.Configuration.ClientDataStore.GetCurrentValues(String key)
   at Sitecore.Configuration.ClientDataStore.GetValue(String name)
   at Sitecore.Security.Authentication.FormsAuthenticationHelper.GetAuthenticationData(String key)
   at Sitecore.SecurityModel.UserRuntimeSettings.Load()
   at Sitecore.Security.Accounts.User.get_RuntimeSettings()
  at Sitecore.Security.Accounts.User.get_IsAdministrator()
   at Sitecore.Data.Managers.ItemProvider.ApplySecurity(ItemList children, SecurityCheck securityCheck)
   at Sitecore.Data.Managers.ItemProvider.GetChildren(Item item, SecurityCheck securityCheck, ChildListOptions options)
   at Sitecore.Data.Managers.PipelineBasedItemProvider.ExecuteAndReturnResult[TArgs,TResult](String pipelineName, String pipelineDomain, Func`1 pipelineArgsCreator, Func`1 fallbackResult)
   at Sitecore.Data.Managers.PipelineBasedItemProvider.GetChildren(Item item, SecurityCheck securityCheck)
   at Sitecore.Collections.ChildList.Populate(ChildListOptions options)
   at Sitecore.Collections.ChildList..ctor(Item ownerItem, ChildListOptions options)
   at Sitecore.Data.ItemResolvers.ItemPathResolver.GetChild(Item item, String itemName)
   at Sitecore.Data.ItemResolvers.ItemPathResolver.DoResolveItem(List`1 pathParts, Item root)
   at Sitecore.Resources.Media.MediaRequest.GetMediaPath(String localPath)
   at Sitecore.Resources.Media.MediaRequest.get_MediaUri()
   at Sitecore.Resources.Media.MediaProvider.ParseMediaRequest(HttpRequest request)
   at Sitecore.Analytics.RobotDetection.Media.MediaRequestSessionModule.IsSessionRequired()
   at Sitecore.Analytics.Media.MediaRequestSessionModule.ContextPostMapRequestHandler(Object sender, EventArgs e)
   at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

That’s basically saying that (in this case) a media item request has failed because SQL Server refused to honour an attempt to write some data.

Sorting this out

Spending a bit of time with my good friend Google, came across blog posts and forum questions talking about this error outside the context of Sitecore. These links suggest that the error is caused when some data from the setup of a replication gets left lying around because the replication wasn’t configured successfully.

Digging back through the history for this project, it turns out that there was a data centre outage in the past, and the replication was broken when the infrastructure was restored. So that tends to suggest that the posts above might be the right thing here.

Those posts suggest you can investigate if this is the issue by looking at the results of running a query like this on your affected database to tell you what triggers are defined:

SELECT
       o.name as 'TableName', t.* 
FROM
       sys.triggers t
       LEFT JOIN sys.objects o ON t.parent_id = o.object_id

Looking at the source database of the broken replication, I saw:

Bingo. A load of “MSmerge_disabledml” triggers are still defined on tables that shouldn’t be replicated anyway. According to Google, these get created during the process of configuring replication but should be tidied up if the replication setup succeeds. At some point one of the attempts to restore replication after the outage seems to have left these lying about.

The “right” fix for this is to remove the replication configuration, ensure all those triggers are removed, and then re-configure the replication correctly. But that would have required downtime of the site. So to get a fix for the Sitecore errors in place quickly (which can then be tidied up later, at a more convenient time) I started by disabling the unwanted triggers.

You can’t write directly to the sys.triggers table to do that, because it belongs to SQL Server. But you can use the “DISABLE TRIGGER” command instead. When using the appropriate database you can run something like:

DISABLE TRIGGER MSmerge_disabledml_0152EE111E9F4B0D9DA847F08E0B6248 ON ClientData

for each of the triggers that need disabling. You should then see the “is_disabled” column change from zero to one in the sys.triggers table. (The trigger name there has to match the name in the second column of the query above)

Doing that for all the unwanted triggers gets rid of the errors in the Sitecore log, and lets things go back to normal.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s