Experimenting with a SolrCloud container for Sitecore

I’ve got a project on the cards that I’d like to use docker containers for, but we’re talking about using SolrCloud for search. Right now, there isn’t a SolrCloud container in the Sitecore community container repo. So I started thinking about what would it take to make one.

Big picture

At their core Solr and SolrCloud are the same software, with some different configuration settings and data storage. So a key part of getting a single-node instance of SolrCloud to run for a developer is adding an extra command-line parameter when you start it up. More complex, however, is creating the indexes Sitecore will need. Ordinary Solr uses Cores, and you can create them easily just by copying files. But SolrCloud uses Collections – which are made up of both cores and data stored by ZooKeeper. Because the data is split out, you can’t easily just drop files to make new collections – you need to use some sort of API.

As an aside, if you’re new to SolrCloud you might want to watch (or read) my Symposium presentation about getting started with SolrCloud and deploying it to production. That explains more about why we have collections, and how they help you.

What this means is to make SolrCloud work, we’re going to need to replace the process that the existing scripts use to create the default set of indexes. I’ve already spent a load of time on some scripts that can automate creating SolrCloud collections as part of the presentation linked to above – so those seem like a good starting point…

First step: PowerShell

The first thing I noticed was that the base image for the existing Solr container is the standard Microsoft Nanoserver image. That’s small – but it doesn’t include PowerShell. While I think it would be possible to do this setup using batch files, it would be a lot of effort to re-write my SolrCloud scripts to avoid PowerShell. So the easy answer for me is to find a base image that does have the scripting engine included. Microsoft offer a PowerShell-on-top-of-Nanoserver image which seems ideal for this purpose. Swapping over is easy: just change the arguments passed into the Dockerfile that builds the base for the Solr container. That lives in the “build.json” file for the Java Runtime image that Solr sits on top of:

{
  "tags": [
    {
      "tag": "sitecore-openjdk:8-nanoserver-${nanoserver_version}",
      "build-options": [
        "--build-arg BUILD_IMAGE=mcr.microsoft.com/windows/servercore:${windowsservercore_version}",
        "--build-arg BASE_IMAGE=mcr.microsoft.com/powershell:nanoserver:${nanoserver_version}"
      ]
    }
  ],
  "sources": []
}

So now when that container builds, PowerShell will be available for use.

Don’t create cores…

In the existing scripts, the core creation is in two parts. The Dockerfile for the Solr image creates the set of empty core files. And then the entrypoint script that runs when the container starts can copy those files into the Solr folders if no cores exist. So the first step in removing this behaviour is to strip out the bit of the Dockerfile that’s creating these base files.

In the Dockerfile, the change is to remove this bit:

RUN New-Item -Path 'C:\\clean' -ItemType Directory | Out-Null; `
    Copy-Item -Path 'C:\\solr\\server\\solr\\*' -Destination 'C:\\clean' -Force -Recurse; `
#    $env:CORE_NAMES -split ',' | ForEach-Object { `
#        $name = $_.Trim(); `
#        $schema = @{$true=('C:\\temp\\{0}' -f $env:MANAGED_SCHEMA_XDB_NAME);$false=('C:\\temp\\{0}' -f $env:MANAGED_SCHEMA_DEFAULT_NAME)}[$name -like '*xdb*']; `
#        Copy-Item -Path 'C:\\clean\\configsets\\_default\\conf' -Destination ('C:\\clean\\{0}\\conf' -f $name) -Recurse -Force; `
#        Copy-Item -Path $schema -Destination ('C:\\clean\\{0}\\conf\\managed-schema' -f $name); `
#        Set-Content -Path ('C:\\clean\\{0}\\core.properties' -f $name) -Value ('name={0}{1}config=solrconfig.xml{1}update.autoCreateFields=false{1}dataDir=data' -f $name, [Environment]::NewLine); `
#        New-Item -Path ('C:\\clean\\{0}\\data' -f $name) -ItemType Directory | Out-Null; `
#    }; `
    Remove-Item -Path 'C:\\clean\\README.txt'; `
    Remove-Item -Path 'C:\\clean\\configsets' -Recurse;

(It’s commented out here, because it’s easier to show the change that way – this can be deleted)

Removing the start-up behaviour that copies these files is easy – because the next step is going to replace the entrypoint anyway…

Firing up SolrCloud

The simplest way to get Solr to start up in SolrCloud mode for a developer is to add “-c” to the command line for running it. That could be done in the entrypoint file that already exists – but that’s
a batch file, and we need PowerShell here. So instead, lets change the Dockerfile have a PowerShell entrypoint. If we name this file “Boot.ps1” to match the pattern, it can get copied in. There’s a second PowerShell script that includes the business for creating collections later. Plus we need to make the container fire up PowerShell and run the new boot script with the appropriate parameters. That all happens at the end of the Dockerfile:

...snip...

EXPOSE 8983

COPY Boot.ps1 .
COPY MakeCollections.ps1 .

CMD ["c:\\program files\\powershell\\pwsh.exe", "-f", "Boot.ps1", "c:\\solr", "8983", "c:\\clean", "c:\\data"]

The Boot.ps1 file needs to do some of the same stuff that the old batch file did, and we’ll extend it. First up, it needs to receive the parameters:

param(
	[string]$solrPath,
	[string]$solrPort,
	[string]$installPath,
	[string]$dataPath
)

The “copy files” behaviour from the original script can stay, but in PowerShell flavour – and it’s doing something with lock files too:

$solrConfig = "$dataPath\solr.xml"
if(Test-Path $solrConfig)
{
	Write-Host "### Config exists!"
}
else
{
	Write-Host "### Config does not exist, creating..."
	Copy-Item "$installPath\\*" "$dataPath" -force -recurse
}

Write-Host "### Preparing Solr cores..."

Push-Location $dataPath
if(Test-Path "write.lock")
{
	Write-Host "### Removing write.lock"
	Remove-Item "write.lock" -Force
}
Pop-Location

And finally, fire up Solr in Cloud mode:

Write-Host "### Starting Solr..."

& "$solrPath\bin\solr.cmd" start -port $solrPort -f -c

And then we need collections…

So earlier, the Dockerfile copied in a second PowerShell script. That is largely my original collection creation script from my SolrCloud scripting. But I’ve made a couple of changes. First up, is that the Sitecore containers are setup without SSL for Solr – so all the API endpoints in the script need to change from “https://” to “http://“. That’s a quick search and replace operation. And the secondly, it needs to include the logic to decide what to do when it’s run.

The logic starts by waiting for Solr to be running:

# wait for it to start
Write-Host "### Waiting on $solrPort..."
Wait-ForSolrToStart "localhost" $solrPort
Write-Host "### Started..."

Because SolrCloud uses both disk and Zookeeper for storing data, we need to be able to call API endpoints to set up collections, so the rest of the script has to wait for Solr to be running. Then we need to check to see if there’s any work to do. This should really get refactored out to a function, but it asks the API how many collections exist at present:

# check for collections - /solr/admin/collections?action=LIST&wt=json
$url = "http://localhost:$solrPort/solr/admin/collections?action=LIST"
$response = Invoke-WebRequest -UseBasicParsing -Uri $url
$obj = $response | ConvertFrom-Json
$collectionCount = $obj.Collections.Length
Write-Host "Collection count: $collectionCount"

If there are no collections, then we need to create them. Otherwise, we can assume it’s already been done and skip this bit. If we do have to create them, the that calls the function I wrote when I was originally automating SolrCloud. This also needs a bit of further hacking, as the set of collections and aliases created is fixed right now and it should be configured by data passed in from the json config data – but it does the job for a demo:

if($collectionCount -eq 0)
{
    Write-Host "Need to create"
    Configure-SolrCollection "c:\clean" "localhost" "$solrPort" 1 1 1 ""
}
else
{
    Write-Host "Already exists"
}

The last step is making sure this script runs. This bit was a bit of a head-scratcher. The script needs to run in parallel with Solr, but Solr has to run with the “-f” flag, which causes it to keep going forever. So the collection creation has to execute before Solr is started, but needs to not block execution of the startup. And it also needs to be able to send its output back to the Docker console, if we’re running connected. After a bit of hackery I settled on PowerShell’s Start-Process command. That runs the script in parallel, but doesn’t wait for it to end. So “Boot.ps1” can be updated:

Write-Host "### Starting Solr..."

Start-Process "c:\\program files\\powershell\\pwsh.exe" -ArgumentList "-f",".\MakeCollections.ps1 $solrPort"

& "$solrPath\bin\solr.cmd" start -port $solrPort -f -c

It starts the process to make collections in the background, and then starts Solr in the foreground – allowing both parts to run in parallel.

Conclusions

So we can now build an image for SolrCloud and starting the container gives you a copy of SolrCloud with some collections created. Success!

And the data folder now includes the core data and the Zookeeper data:

(The prefix on the collection names here was part of my testing – it wouldn’t be necessary in a real deployment of this)

But I’m aware there’s a load more to do here:

  • The collections and aliases need creating based on configuration being passed into the scripts. There’s already a “CORE_NAMES” variable, but it’s not used by the code above right now.
  • The collections are probably getting created with the wrong core config schema right now. I didn’t change what was being used from my original scripts, which were for Sitecore v9.1 – that probably needs updating. There’s some core config data in the docker script repo – but it doesn’t appear to be the right data for using with the SolrCloud APIs, so there’s some further work to do there.
  • The collection setup is slower than the creation of cores from before. Maybe that means the other containers need to wait for it to be finished before they run, to avoid errors because
    they don’t exist initially? I’ve not tested a full startup of Sitecore yet.
  • And the other containers need config changes to use SolrCloud as well. Connection strings need to be updated to add the “;SolrCloud=true” parameter to them for a start.
  • I need to work out how to integrate these changes into the main docker scripts properly – should SolrCloud be an option from the initial build script? Something that gets built by default in parallel? Or something added on manually? Not sure at the moment.

So I need to find some more time to work on this – but it’s a promising start…

One thought on “Experimenting with a SolrCloud container for Sitecore

  1. Pingback: SolrCloud with Sitecore 10 | Jeremy Davis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.