Be careful when you secure your HTTPS ciphers

One of the big things in IT security in recent times has been the successful attacks black-hats have launched against the infrastructure of cryptography. As we all come to rely on encrypted communications more and more, the vulnerabilities in old ciphers have become more of a problem to us developers and administrators. Vulnerabilities like Drown and Poodle are just two examples of a trend which means we all now have to worry about how our crypto is configured before we allow the internet to see a server.

But whenever you tie down security more tightly you risk causing problems when software relies on the thing you’ve just disabled…

I spent some time recently investigating why certain aspects of the Coveo for Sitecore search framework were broken on a client’s server, and the answer ended up being directly related to crypto security. Here’s what happened:

Finding a bug…

My team and I had been working on some new features for a client’s site. Having passed our internal QA phase we deployed the changes on to their UAT servers for them to run their own tests. But when we did this, we noticed that one key feature of the new pages was broken – any page containing a particular search-driven component threw an exception. With custom errors turned off we saw a message saying The request was aborted: Could not create SSL/TLS secure channel which originated from Coveo code:

Coveo YSOD

[WebException: The request was aborted: Could not create SSL/TLS secure channel.]
   System.Net.HttpWebRequest.GetResponse() +1740
   System.ServiceModel.Channels.HttpChannelRequest.WaitForReply(TimeSpan timeout) +75

[SecurityNegotiationException: Could not establish secure channel for SSL/TLS with authority 'coveo:52810'.]
   System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg) +14375718
   System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type) +388
   Coveo.Framework.CoveoSearchService.CoveoSearchService.ExecuteQuery(SearchSession Session, QueryParams Params) +0
   Coveo.Framework.Connection.ClientSessionWrapper.ExecuteQuery(QueryParams p_QueryParams) +211
   Coveo.SearchProvider.LinqToCoveoIndex`1.ExecuteQuery(QueryParams p_Query, IEnumerable`1 p_QueriedTypes) +905
   Coveo.SearchProvider.LinqToCoveoIndex`1.Execute(CoveoCompositeQuery p_Query) +547
   ASP.searchtest_aspx.Page_Load(Object sender, EventArgs e) +735
   System.Web.UI.Control.OnLoad(EventArgs e) +109
   System.Web.UI.Control.LoadRecursive() +68
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +4498

This site was set up with Coveo installed on a separate server, and Sitecore’s default configuration had been updated to remove all the legacy Lucene indexes that Coveo provides replacements for.

Clicking about the public site, all of the other code using search was working. After some thinking we realised that code using Coveo’s REST API was fine, but anything using Sitecore’s ContentSearch API to pass queries to Coveo was broken.

While we were pondering this, one of my colleagues remembered another issue that had been raised previously and parked. Some UI components in Content Editor were also giving a similar error message. For example, if you browsed to a content item, and look at the “Semantics” field (under the “Tagging” group, made visible via showing Standard Fields“) then you see an error saying Field control has failed to render: Could not establish secure channel for SSL/TLS with authority 'coveo:52810':

Content Editor Error

Opening up the Coveo diagnostics gave us this:

Coveo Diagnostics

(Obviously, the name of the server running Coveo will be different in your environment)

Could not establish secure channel for SSL/TLS with authority 'coveo:52810'.
 
System.ServiceModel.Security.SecurityNegotiationException: Could not establish secure channel for SSL/TLS with authority 'coveo:52810'. ---> System.Net.WebException: The request was aborted: Could not create SSL/TLS secure channel.
   at System.Net.HttpWebRequest.GetResponse()
   at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)
   --- End of inner exception stack trace ---

Server stack trace: 
   at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason)
   at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)
   at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)
   at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
   at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
   at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

Exception rethrown at [0]: 
   at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
   at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
   at Coveo.Framework.CoveoSearchService.CoveoSearchService.GetSearchProviderFields(SearchSession Session)
   at Coveo.SearchProvider.Applications.StateVerifier.VerifySearchService()
   at Coveo.SearchProvider.Applications.StateVerifier.<>c__DisplayClass10.<GetSearchServiceState>b__f()
   at Coveo.SearchProvider.Applications.BaseVerifier.VerifyComponent(Func`1 p_VerifyMethod, String p_ComponentName)

That’s the same error that was shown in Content Editor – just formatted a bit better.

What didn’t help us…

So my colleagues and I spent some time trying to work out what had gone wrong.

We investigated quite a few things without success, including:

  • Had Windows’ TLS infrastructure been disabled somehow?
    We checked to see what Protocols were enabled, and while SSL had been turned off, TLS was still allowed.
  • Had a firewall rule blocked connections to this port?
    Since we don’t directly manage the infrastructure being used by the client, it was possible that a change here had caused the issue, however tests ruled this out.
  • Had the SSL Certificate that Coveo used for these connections expired?
    The certificate that Coveo uses is self-signed, so can give warnings in your browser, but re-generating it did not resolve the issue.
  • Had a Coveo for Sitecore configuration file been messed up somehow?
    We checked all the configuration mentioned in Coveo’s install instructions, but were unable to spot an issue with the settings.
  • Had the Coveo CES server been broken in some way?
    The admin UI worked, queries worked via the REST API and the only log errors were Java’s equivalent of the stack traces above. Hence this seemed unlikely, but was harder to rule out.

Turns out the answer was on the second page…

We knew that another colleague (who was on holiday – so not available to help with this) had spent some time securing the web servers, and had used a tool called “IIS Crypto” to update the server’s settings. That was why one of the first things we checked was whether TLS had been disabled. However none of the people looking at this issue had much experience with this sort of configuration so we weren’t entirely sure what we were doing… But as we got more desperate and did more research, we fired up that tool again. This time around we noticed that as well as allowing you to control what protocols are enabled, it also has another page of settings for what cipher suites are can be used:

IIS Crypto

And when we compared a working server with the broken one, we noticed that the lists of enabled suites were different.

Resetting the enabled cipher suites settings to match the working server and rebooting the box got rid of the Coveo Diagnostics error:

Coveo Fixed

And it fixed our site too, as well as the errors in the Sitecore UI:

Sitecore Fixed

Wild celebration ensued…

In conclusion…

So over-enthusiastic securing of your ciphers can cause you some pain.

If you’re going to use tools like IIS Crypto to tie down the acceptable encryption your servers will use, you need to make sure you test everything that uses HTTPS in your solution. Underlying services are as likely to break as front-end features and the issues caused can be hidden for some time. If we’d realised the Sitecore UI error was important when it was first spotted, we could have prevented a lot of the pain here. That issue had been present since the security configuration was hardened on this server, but nobody realised its significance – it was only broken front-end code that raised the priority for a fix.

Based on this experience, clicking the “best practice” button in the IIS Crypto tool should give you a secure site which still works without the need to worry about individual settings, however that may be dependent on the precise setup of your site and its servers. You need to be quite careful about what other things you disable.

I also note that IIS Crypto can apply different security to “server” cryptography and “client” cryptography. When you’re securing your servers to try and prevent malicious third parties from using crypto attacks against the sites you host, then it’s the server-side stuff you need to secure. But what was broken for us here was the client behaviour of our web servers making calls to the Coveo server. That suggests there may be good reasons why you want these two areas of configuration to be different.

It’s also important to pay particular attention to what you’re disabling if your only access to the server is via RDP (for example you’re using servers hosted in the cloud). I know this from personal experience, as in writing this post I managed to disable ciphers RDP relies on and break access to the VM I had created to get screenshots… Thank goodness Azure makes VMs easy to create and destroy. That’s also a timely reminder of why we should never do configuration experiments on real client servers!

PS: I want to offer thanks to Active Commerce king and all-round helpful bloke Nick Wesselman for the advice he gave me on this issue via the Sitecore Slack site. His suggestion of why it was broken was right – it just took me a while to realise how it had happened… Cheers Nick.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s