Dealing with .tar.gz files on Windows Server

A couple of times recently, I’ve found myself needing to deploy files that come wrapped in a .tar.gz archive onto servers. On your desktop that’s not too much of a problem – you just run the installer for your preferred 3rd party tool, or maybe use the new Unixy shell and you get on with it. But on client servers security can be higher and you don’t always get the option to run any old installer. So I needed an alternative…

Looking for inspiration, I did a bit of googling and came across a thread on Stack Overflow which suggested there is a PowerShell extension for handling .tar files. It’s based on 7Zip’s libraries – but it doesn’t require installing the full 7Zip toolset, and it can be fetched direct from the PowerShell module feed.

That seemed like a good staring point – but the code in the answers was going to need a bit of work. So I’ve taken that as a basis and produced my own script to use to deal with .tar.gz archives.

The main script

To be useful, the script is going to need to receive a file to extract, and a folder to put the results into. That’s easily done by declaring a couple of mandatory parameters:

[cmdletbinding(SupportsShouldProcess=$True)]
param(
    [Parameter(Mandatory=$True)]
    [string]$FileToExtract,

    [Parameter(Mandatory=$True)]
    [string]$TargetFolder,

    [int]$BufferSize = 1024
)

The parameters also declare that this script will honour the “-whatif” parameter via the cmdletbinding() attribute at the top. And it declares an optional parameter for the size of the file IO buffer used when extracting the .gzip stream – more on that later.

The logic of the script is fairly simple. First it needs to do some basic validation of the file it’s going to process. First it can test the file actually exists:

if(!(Test-Path $FileToExtract))
{
    throw "Source file '$FileToExtract' does not exist"
}

And then it can test that it has the right extension:

if(!$FileToExtract.EndsWith(".tar.gz", "CurrentCultureIgnoreCase"))
{
    throw "Source file '$FileToExtract' does not have a .tar.gz extension"
}

Once it’s happy, the overall processing is broken up into six operations. The first two make sure that the source file has an absolute path, before working out the right name for the .tar file that will be hiding inside the .tar.gz file.

Once that’s done, the real work is to expand the GZip data to get the .tar file. That can be done in native .Net code that I’ll get to in a sec. Then it has to make sure the extension for handling .tar files is installed – by grabbing a copy of the “7Zip4PowerShell” PowerShell module if it’s not available already. That can then be called to extract the data, before finally deleting the temporary .tar file:

$FileToExtract = Resolve-Path $FileToExtract
$tarFile = Calculate-TarFileName $FileToExtract

Expand-GZip $FileToExtract $tarFile $BufferSize
Ensure-7Zip
Extract-Tar $tarFile $TargetFolder

if ($PSCmdlet.ShouldProcess($tarFile,'Remove temporary tar file')) {
    Remove-Item $tarFile
}

What’s in all those functions? Read on…

Expanding the GZip stuff

The first step is working out the name for the output from the GZip file – which is the tar file. That’s pretty trivial, as it just means stripping the final “.gz” off the input filename:

function Calculate-TarFileName {
    param(
        [Parameter(Mandatory=$true)]
        [string] $targzFile
    )

    $targzFile.Substring(0, $targzFile.LastIndexOfAny('.'))
}

Expanding the GZip file takes a bit more work. This is largely cribbed from one of the answers on the Stack Overflow thread referenced above – bit with some enhancements. Firstly it now understands -whatif and won’t actually generate the new file in that scenario. And secondly it adds some code to enable a progress bar. Other than that, it’s basically just processing the GzipStream using standard .Net code:

function Expand-GZip {
    [cmdletbinding(SupportsShouldProcess=$True)]
    param(
        [Parameter(Mandatory=$true)]
        [string]$infile,
        [Parameter(Mandatory=$true)]
        [string]$outFile,
        [int]$bufferSize = 1024
    )
    $fileSize = Original-GzipFileSize $inFile
    $processed = 0

    if ($PSCmdlet.ShouldProcess($infile,"Expand gzip stream")) {
        $input = New-Object System.IO.FileStream $inFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)
        $output = New-Object System.IO.FileStream $outFile, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
        $gzipStream = New-Object System.IO.Compression.GzipStream $input, ([IO.Compression.CompressionMode]::Decompress)

        $buffer = New-Object byte[]($bufferSize)
        while($true){
        
            $pc = (($processed / $fileSize) * 100) % 100
            Write-Progress "Extracting tar from gzip" -PercentComplete $pc
        
            $read = $gzipstream.Read($buffer, 0, $bufferSize)

            $processed = $processed + $read

            if ($read -le 0)
            {
                Write-Progress "Extracting tar from gzip" -Completed
                break
            }
            $output.Write($buffer, 0, $read)
        }

        $gzipStream.Close()
        $output.Close()
        $input.Close()
    }
}

Making a useful progress bar involves knowing the size of the final stream, however. But Google to the rescue again here, as it pointed me towards this CodeProject posting that describes the C# code to achieve this. Turns out you just need to verify that it’s actually a GZip stream (looking at the first three bytes) and the find the last four bytes to get an Int32 that is the original length:

function Original-GzipFileSize {
    param(
        [Parameter(Mandatory=$true)]
        [string] $gzipFile
    )
    
    $fs = New-Object System.IO.FileStream $gzipFile, ([IO.FileMode]::Open), ([IO.FileAccess]::Read), ([IO.FileShare]::Read)

    try
    {
        $fh = New-Object byte[](3)
        $fs.Read($fh, 0, 3) | Out-Null
        # If magic numbers are 31 and 139 and the deflation id is 8 then this is a file to process
        if ($fh[0] -eq 31 -and $fh[1] -eq 139 -and $fh[2] -eq 8) 
        {
            $ba = New-Object byte[](4)
            $fs.Seek(-4, [System.IO.SeekOrigin]::End) | Out-Null
            $fs.Read($ba, 0, 4) | Out-Null
                
            return [int32][System.BitConverter]::ToInt32($ba, 0)
        }
        else
        {
            throw "File '$gzipFile' does not have the correct gzip header"
        }
    }
    finally
    {
        $fs.Close()
    }
}

So the Expand-GZip function can use that to work out a percentage completion as it iterates through blocks of the stream…

I mentioned before that the overall script has an option for block sizes for processing here. It defaults to 1KB here because that’s what was in the code I cribbed, but you can pass a bigger block size to trade off speed for memory usage.

But once that’s complete the initial .tar.gz file will have a .tar alongside it.

Dealing with the .tar file

The original Stack Overflow thread included an answer that suggested the “7Zip4PowerShell” module for PowerShell would be the simplest approach here. This code breaks it up into two tasks – one to make sure the module is available locally to use, and the other to actually use it.

The thread talks about two approches to using that module. One where you manually copy the required files locally and the script picks those up and uses them, and one where it asks the Install-Package commandlet to fetch it from the public feed. For laughs I decided to combine the two, as I could see scenarios where both mught be useful.

So if you put the files for the package to the “7Zip4Powershell” folder next to the script, it’ll spot them and use this local copy. To get those files you can run

Save-Module -Name 7Zip4Powershell -Path .

from a prompt. This approach will be most useful on highly-secured servers where the admins want to vet all the files your work uses before you do anything with them. You can hand over both this script and the files for that module for investigation, and they just need copying to the server for use.

Alternatively, if that folder does not exist, it’ll pull the module from the public feed. That’s the “zero effort” approach when you’re allowed to use it. You don’t need to do any extra work – it’ll just pull in the right code if it’s not already installed for you. And as before, the logic is wrapped up to support -WhatIf.

function Ensure-7Zip {
    param(
        [string]$pathToModule = ".\7Zip4Powershell\1.9.0\7Zip4PowerShell.psd1"
    )

    if (-not (Get-Command Expand-7Zip -ErrorAction Ignore)) {
        if(Test-Path $pathToModule)
        {
            if ($PSCmdlet.ShouldProcess($pathToModule,"Install 7Zip module from local path")) {
                Write-Progress -Activity "Installing the 7Zip4PowerShell module" "Using local module" -PercentComplete 50
                Import-Module $pathToModule
                Write-Progress -Activity "Installing the 7Zip4PowerShell module" "Using local module" -Completed
            }
        }
        else
        {
            if ($PSCmdlet.ShouldProcess("PowerShell feed",'Install 7Zip module')) {
                Write-Progress  -Activity "Installing the 7Zip4PowerShell module" "Using public feed" -PercentComplete 50
                $progressPreference = 'silentlyContinue'
                Install-Package -Scope CurrentUser -Force 7Zip4PowerShell > $null
                $progressPreference = 'Continue'
                Write-Progress  -Activity "Installing the 7Zip4PowerShell module" "Using public feed" -Completed
            }
        }
    }
}

At the moment this code is hard-coded to the current version of this module – Making that more flexible is on my backlog, if I get a chance…

Once the code installed, the actual command to run the extraction is easy:

function Extract-Tar {
    [cmdletbinding(SupportsShouldProcess=$True)]
    param(
        [Parameter(Mandatory=$true)]
        [string] $tarFile,
        [Parameter(Mandatory=$true)]
        [string] $dest
    )

    if ($PSCmdlet.ShouldProcess($tarFile,"Expand tar file")) {
        Expand-7Zip $tarFile $dest
    }
}

All that adds on top of the 7Zip command is the “-WhatIf” logic, as that doesn’t seem to be supported by default.

In action…

So with that installed, you can extract files to your hearts content:

And if you pass -WhatIf then it tells you what it would do, but does nothing:

I’ve put the full code up in a gist if you think it would be useful for you…

One thought on “Dealing with .tar.gz files on Windows Server

  1. I faced the exact same thing last week – where I was trying to identify how to extract the Gzip and Tar files. I kind of went the similar way – where I used 7zip to unzip it from PowerShell.
    This was when I was setting up ZooKeeper for my Solr Cloud.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.