-
Notifications
You must be signed in to change notification settings - Fork 977
GZip and Tar Samples
(Back to Code Reference main page)
How to use SharpZipLib to work with GZip and Tar files
GZip and Tar files are commonly encountered together. These samples cover handling them both individually and combined.
Table of Contents on this page
Extract the file within a GZip
Simple full extract from a Tar archive
Simple full extract from a TGZ or .Tar.GZip archive
Extract from a Tar with full control
Create a TGZ (.tar.gz)
Create a TAR or TGZ with control over filenames and data source
Updating files within a .tgz
You create a new instance of GZipInputStream, passing in a stream (of any kind) containing the archive. You then read the contents of this stream until eof. This straightforward example shows how to extract the contents of a gzip file, and write the content to a disk file in the nominated directory.
using System;
using System.IO;
using ICSharpCode.SharpZipLib.Core;
using ICSharpCode.SharpZipLib.GZip;
/// <summary>
// Extracts the file contained within a GZip to the target dir.
// A GZip can contain only one file, which by default is named the same as the GZip except
// without the extension.
/// </summary>
public void ExtractGZipSample(string gzipFileName, string targetDir)
{
// Use a 4K buffer. Any larger is a waste.
byte[ ] dataBuffer = new byte[4096];
using (System.IO.Stream fs = new FileStream(gzipFileName, FileMode.Open, FileAccess.Read))
{
using (GZipInputStream gzipStream = new GZipInputStream(fs))
{
// Change this to your needs
string fnOut = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFileName));
using (FileStream fsOut = File.Create(fnOut))
{
StreamUtils.Copy(gzipStream, fsOut, dataBuffer);
}
}
}
}
VB
Imports System
Imports System.IO
Imports ICSharpCode.SharpZipLib.Core
Imports ICSharpCode.SharpZipLib.GZip
' Extracts the file contained within a GZip to the target dir.
' A GZip can contain only one file, which by default is named the same as the GZip except
' without the extension.
'
Public Sub ExtractGZipSample(gzipFileName As String, targetDir As String)
' Use a 4K buffer. Any larger is a waste.
Dim dataBuffer As Byte() = New Byte(4095) {}
Using fs As System.IO.Stream = New FileStream(gzipFileName, FileMode.Open, FileAccess.Read)
Using gzipStream As New GZipInputStream(fs)
' Change this to your needs
Dim fnOut As String = Path.Combine(targetDir, Path.GetFileNameWithoutExtension(gzipFileName))
Using fsOut As FileStream = File.Create(fnOut)
StreamUtils.Copy(gzipStream, fsOut, dataBuffer)
End Using
End Using
End Using
End Sub
A Tar file or archive is essentially a simple concatenation of multiple files. If you only need to extract all the contents of the tar to a folder path with no conditionals or name transformations, this easy example may be all you need.
using System;
using System.IO;
using ICSharpCode.SharpZipLib.Tar;
public void ExtractTar(String tarFileName, String destFolder)
{
Stream inStream = File.OpenRead(tarFileName);
TarArchive tarArchive = TarArchive.CreateInputTarArchive(inStream);
tarArchive.ExtractContents(destFolder);
tarArchive.Close();
inStream.Close();
}
VB
Imports System
Imports System.IO
Imports ICSharpCode.SharpZipLib.Tar
Public Sub ExtractTar(tarFileName As String, destFolder As String)
Dim inStream As Stream = File.OpenRead(tarFileName)
Dim tarArchive As TarArchive = TarArchive.CreateInputTarArchive(inStream)
tarArchive.ExtractContents(destFolder)
tarArchive.Close()
inStream.Close()
End Sub
A Unix TGZ provides concatenation of multiple files (tar) with compression (gzip). This sample illustrates the automatic extraction capabilities of the library. The folder structure of the Tar archive is preserved, within the nominated target directory.
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
// example: ExtractTGZ(@"c:\temp\test.tar.gz", @"C:\DestinationFolder")
public void ExtractTGZ(String gzArchiveName, String destFolder)
{
Stream inStream = File.OpenRead(gzArchiveName);
Stream gzipStream = new GZipInputStream(inStream);
TarArchive tarArchive = TarArchive.CreateInputTarArchive(gzipStream);
tarArchive.ExtractContents(destFolder);
tarArchive.Close();
gzipStream.Close();
inStream.Close();
}
VB
Imports ICSharpCode.SharpZipLib.GZip
Imports ICSharpCode.SharpZipLib.Tar
' for example: ExtractTGZ("c:\temp\test.tar.gz", "C:\DestinationFolder")
Public Sub ExtractTGZ(ByVal gzArchiveName As String, ByVal destFolder As String)
Dim inStream As Stream = File.OpenRead(gzArchiveName)
Dim gzipStream As Stream = New GZipInputStream(inStream)
Dim tarArchive As TarArchive = TarArchive.CreateInputTarArchive(gzipStream)
tarArchive.ExtractContents(destFolder)
tarArchive.Close()
gzipStream.Close()
inStream.Close()
End Sub
By contrast with the sample above, this sample traverses through the tar, one entry at a time, extracting the contents to the nominated folder and allowing for skipping or renaming of individual entries. Updated: Also handles Ascii translate, and fixes problem if TAR entry filename begins with a "". Now sets the file date/time.
using System;
using System.IO;
using ICSharpCode.SharpZipLib.Tar;
/// </summary>
// Iterates through each file entry within the supplied tar,
// extracting them to the nominated folder.
/// </summary>
public void ExtractTarByEntry(string tarFileName, string targetDir, bool asciiTranslate)
{
using (FileStream fsIn = new FileStream(tarFileName, FileMode.Open, FileAccess.Read))
{
TarInputStream tarIn = new TarInputStream(fsIn);
TarEntry tarEntry;
while ((tarEntry = tarIn.GetNextEntry()) != null)
{
if (tarEntry.IsDirectory)
continue;
// Converts the unix forward slashes in the filenames to windows backslashes
string name = tarEntry.Name.Replace('/', Path.DirectorySeparatorChar);
// Remove any root e.g. '\' because a PathRooted filename defeats Path.Combine
if (Path.IsPathRooted(name))
name = name.Substring(Path.GetPathRoot(name).Length);
// Apply further name transformations here as necessary
string outName = Path.Combine(targetDir, name);
string directoryName = Path.GetDirectoryName(outName);
// Does nothing if directory exists
Directory.CreateDirectory(directoryName);
FileStream outStr = new FileStream(outName, FileMode.Create);
if (asciiTranslate)
CopyWithAsciiTranslate(tarIn, outStr);
else
tarIn.CopyEntryContents(outStr);
outStr.Close();
// Set the modification date/time. This approach seems to solve timezone issues.
DateTime myDt = DateTime.SpecifyKind(tarEntry.ModTime, DateTimeKind.Utc);
File.SetLastWriteTime(outName, myDt);
}
tarIn.Close();
}
}
private void CopyWithAsciiTranslate(TarInputStream tarIn, Stream outStream)
{
byte[ ] buffer = new byte[4096];
bool isAscii = true;
bool cr = false;
int numRead = tarIn.Read(buffer, 0, buffer.Length);
int maxCheck = Math.Min(200, numRead);
for (int i = 0; i < maxCheck; i++)
{
byte b = buffer[i];
if (b < 8 || (b > 13 && b < 32) || b == 255)
{
isAscii = false;
break;
}
}
while (numRead > 0)
{
if (isAscii)
{
// Convert LF without CR to CRLF. Handle CRLF split over buffers.
for (int i = 0; i < numRead; i++)
{
byte b = buffer[i]; // assuming plain Ascii and not UTF-16
if (b == 10 && !cr) // LF without CR
outStream.WriteByte(13);
cr = (b == 13);
outStream.WriteByte(b);
}
}
else
outStream.Write(buffer, 0, numRead);
numRead = tarIn.Read(buffer, 0, buffer.Length);
}
}
VB
Imports System
Imports System.IO
Imports ICSharpCode.SharpZipLib.Tar
' Iterates through each file entry within the supplied tar,
' extracting them to the nominated folder.
'
Public Sub ExtractTarByEntry(tarFileName As String, targetDir As String)
Using fsIn As New FileStream(tarFileName, FileMode.Open, FileAccess.Read)
' The TarInputStream reads a UNIX tar archive as an InputStream.
'
Dim tarIn As New TarInputStream(fsIn)
Dim tarEntry As TarEntry
While (InlineAssignHelper(tarEntry, tarIn.GetNextEntry())) IsNot Nothing
If tarEntry.IsDirectory Then
Continue While
End If
' Converts the unix forward slashes in the filenames to windows backslashes
'
Dim name As String = tarEntry.Name.Replace("/"C, Path.DirectorySeparatorChar)
' Apply further name transformations here as necessary
Dim outName As String = Path.Combine(targetDir, name)
Dim directoryName As String = Path.GetDirectoryName(outName)
Directory.CreateDirectory(directoryName)
Dim outStr As New FileStream(outName, FileMode.Create)
If asciiTranslate Then
CopyWithAsciiTranslate(tarIn, outStr)
Else
tarIn.CopyEntryContents(outStr)
End If
outStr.Close()
' Set the modification date/time. This approach seems to solve timezone issues.
Dim myDt As DateTime = DateTime.SpecifyKind(tarEntry.ModTime, DateTimeKind.Utc)
File.SetLastWriteTime(outName, myDt)
End While
tarIn.Close()
End Using
End Sub
Private Sub CopyWithAsciiTranslate(tarIn As TarInputStream, outStream As Stream)
Dim buffer As Byte() = New Byte(4095) {}
Dim isAscii As Boolean = True
Dim cr As Boolean = False
Dim numRead As Integer = tarIn.Read(buffer, 0, buffer.Length)
Dim maxCheck As Integer = Math.Min(200, numRead)
For i As Integer = 0 To maxCheck - 1
Dim b As Byte = buffer(i)
If b < 8 OrElse (b > 13 AndAlso b < 32) OrElse b = 255 Then
isAscii = False
Exit For
End If
Next
While numRead > 0
If isAscii Then
' Convert LF without CR to CRLF. Handle CRLF split over buffers.
For i As Integer = 0 To numRead - 1
Dim b As Byte = buffer(i) ' assuming plain Ascii and not UTF-16
If b = 10 AndAlso Not cr Then ' LF without CR
outStream.WriteByte(13)
End If
cr = (b = 13)
outStream.WriteByte(b)
Next
Else
outStream.Write(buffer, 0, numRead)
End If
numRead = tarIn.Read(buffer, 0, buffer.Length)
End While
End Sub
This shows how to create a tar archive and gzip that at the same time. This example recurses down a directory structure adding all the files.
For more advanced options giving control over filenames and data source, see the next example.
using System;
using System.IO;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
// example: CreateTarGZ(@"c:\temp\gzip-test.tar.gz", @"c:\data");
private void CreateTarGZ(string tgzFilename, string sourceDirectory)
{
Stream outStream = File.Create(tgzFilename);
Stream gzoStream = new GZipOutputStream(outStream);
TarArchive tarArchive = TarArchive.CreateOutputTarArchive(gzoStream);
// Note that the RootPath is currently case sensitive and must be forward slashes e.g. "c:/temp"
// and must not end with a slash, otherwise cuts off first char of filename
// This is scheduled for fix in next release
tarArchive.RootPath = sourceDirectory.Replace('\\', '/');
if (tarArchive.RootPath.EndsWith("/"))
tarArchive.RootPath = tarArchive.RootPath.Remove(tarArchive.RootPath.Length - 1);
AddDirectoryFilesToTar(tarArchive, sourceDirectory, true);
tarArchive.Close();
}
private void AddDirectoryFilesToTar(TarArchive tarArchive, string sourceDirectory, bool recurse)
{
// Optionally, write an entry for the directory itself.
// Specify false for recursion here if we will add the directory's files individually.
TarEntry tarEntry = TarEntry.CreateEntryFromFile(sourceDirectory);
tarArchive.WriteEntry(tarEntry, false);
// Write each file to the tar.
string[] filenames = Directory.GetFiles(sourceDirectory);
foreach (string filename in filenames)
{
tarEntry = TarEntry.CreateEntryFromFile(filename);
tarArchive.WriteEntry(tarEntry, true);
}
if (recurse)
{
string[] directories = Directory.GetDirectories(sourceDirectory);
foreach (string directory in directories)
AddDirectoryFilesToTar(tarArchive, directory, recurse);
}
}
VB
Imports System
Imports System.IO
Imports ICSharpCode.SharpZipLib.GZip
Imports ICSharpCode.SharpZipLib.Tar
' Calling example
CreateTarGZ(@"c:\temp\gzip-test.tar.gz", @"c:\data");
Private Sub CreateTarGZ(tgzFilename As String, sourceDirectory As String)
Dim outStream As Stream = File.Create(tgzFilename)
Dim gzoStream As Stream = New GZipOutputStream(outStream)
Dim tarArchive__1 As TarArchive = TarArchive.CreateOutputTarArchive(gzoStream)
' Note that the RootPath is currently case sensitive and must be forward slashes e.g. "c:/temp"
' and must not end with a slash, otherwise cuts off first char of filename
' This is scheduled for fix in next release
tarArchive__1.RootPath = sourceDirectory.Replace("\"C, "/"C)
If tarArchive__1.RootPath.EndsWith("/") Then
tarArchive__1.RootPath = tarArchive__1.RootPath.Remove(tarArchive__1.RootPath.Length - 1)
End If
AddDirectoryFilesToTar(tarArchive__1, sourceDirectory, True)
tarArchive__1.Close()
End Sub
Private Sub AddDirectoryFilesToTar(tarArchive As TarArchive, sourceDirectory As String, recurse As Boolean)
' Optionally, write an entry for the directory itself.
' Specify false for recursion here if we will add the directory's files individually.
'
Dim tarEntry__1 As TarEntry = TarEntry.CreateEntryFromFile(sourceDirectory)
tarArchive.WriteEntry(tarEntry__1, False)
' Write each file to the tar.
'
Dim filenames As String() = Directory.GetFiles(sourceDirectory)
For Each filename As String In filenames
tarEntry__1 = TarEntry.CreateEntryFromFile(filename)
tarArchive.WriteEntry(tarEntry__1, True)
Next
If recurse Then
Dim directories As String() = Directory.GetDirectories(sourceDirectory)
For Each directory__2 As String In directories
AddDirectoryFilesToTar(tarArchive, directory__2, recurse)
Next
End If
End Sub
This shows how to create a TAR or TAR.GZ archive, using manual creation of entries and copying data to output. This sample shows the processing of files in a directory, and recursing down the directory structure.
To illustrate how to create TAR entries from any stream data, in this example we use the following construct: (Note that the type is the abstract Stream class.)
Stream inputStream = File.OpenRead(filename)
You can replace this with a Stream sourced in any other way - for example a MemoryStream (it does not have to be a File stream).
using System;
using System.IO;
using ICSharpCode.SharpZipLib.Tar;
public void TarCreateFromStream()
{
// Create an output stream. Does not have to be disk, could be MemoryStream etc.
string tarOutFn = @"c:\temp\test.tar";
Stream outStream = File.Create(tarOutFn);
// If you wish to create a .Tar.GZ (.tgz):
// - set the filename above to a ".tar.gz",
// - create a GZipOutputStream here
// - change "new TarOutputStream(outStream)" to "new TarOutputStream(gzoStream)"
// Stream gzoStream = new GZipOutputStream(outStream);
// gzoStream.SetLevel(3); // 1 - 9, 1 is best speed, 9 is best compression
TarOutputStream tarOutputStream = new TarOutputStream(outStream);
CreateTarManually(tarOutputStream, @"c:\temp\debug");
// Closing the archive also closes the underlying stream.
// If you don't want this (e.g. writing to memorystream), set tarOutputStream.IsStreamOwner = false
tarOutputStream.Close();
}
private void CreateTarManually(TarOutputStream tarOutputStream, string sourceDirectory)
{
// Optionally, write an entry for the directory itself.
TarEntry tarEntry = TarEntry.CreateEntryFromFile(sourceDirectory);
tarOutputStream.PutNextEntry(tarEntry);
// Write each file to the tar.
string[] filenames = Directory.GetFiles(sourceDirectory);
foreach (string filename in filenames)
{
// You might replace these 3 lines with your own stream code
using (Stream inputStream = File.OpenRead(filename))
{
string tarName = filename.Substring(3); // strip off "C:\"
long fileSize = inputStream.Length;
// Create a tar entry named as appropriate. You can set the name to anything,
// but avoid names starting with drive or UNC.
TarEntry entry = TarEntry.CreateTarEntry(tarName);
// Must set size, otherwise TarOutputStream will fail when output exceeds.
entry.Size = fileSize;
// Add the entry to the tar stream, before writing the data.
tarOutputStream.PutNextEntry(entry);
// this is copied from TarArchive.WriteEntryCore
byte[] localBuffer = new byte[32 * 1024];
while (true)
{
int numRead = inputStream.Read(localBuffer, 0, localBuffer.Length);
if (numRead <= 0)
break;
tarOutputStream.Write(localBuffer, 0, numRead);
}
}
tarOutputStream.CloseEntry();
}
// Recurse. Delete this if unwanted.
string[] directories = Directory.GetDirectories(sourceDirectory);
foreach (string directory in directories)
CreateTarManually(tarOutputStream, directory);
}
The Unix .tgz or .tar.gz format is almost the equivalent of a Zip archive in Windows, but this combination does not allow directly adding or replacing files within the archive. This is because all the files are concatenated into a single file (tar) which is then compressed as a unit.
Updating items within this would require the decompressing into the original tar, creating a new tar from the old one plus changes, and recompressing the entire thing.