Broken GZip header since version 1.2 #393

SamuelWeibel · 2019-11-01T09:59:10Z

Steps to reproduce

Create a .tar.gz file using the test code provided at the bottom of this issue.
Running with SharpZipLib 1.1.0, the file starts (as expected) with the magic header 0x1f, 0x8b
Running with SharpZipLib 1.2.0, the file contains additional bytes before the magic header 0x1f, 0x8b

Expected behavior

The GZip file should always start with the magic header 0x1f, 0x8b.
Otherwise it can't be opened with most gz compatible applications.

Actual behavior

The GZip file starts with the following bytes (up to and including the magic header):

02 08 20 80 00 02 08 20
80 00 02 08 20 80 00 02
08 20 80 00 02 08 20 80
00 02 08 20 80 00 1F 8B

If you strip away the "garbage" (first 30 bytes), the file seems to be correct.

Version of SharpZipLib

1.2.0
Running under .NET Core 3.0 (final)

Obtained from

Package installed using NuGet

Test code

Remark: this is stripped down production code, so please don't judge it ;-)

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
using Xunit;

namespace ReproSharpZipLib
{
    public class TestTarGzip
    {
        [Fact]
        public async Task Compress()
        {
            var memory = new MemoryStream();
            await using (var gzip = new GZipOutputStream(memory) { IsStreamOwner = false })
            await using (var tar = new TarOutputStream(gzip))
            {
                var bytes = Encoding.UTF8.GetBytes("This is the first file");
                tar.PutNextEntry(this.CreateEntry("first.txt", bytes.Length));
                tar.Write(bytes);
                tar.CloseEntry();

                await this.AddSomeFilesAsync(tar, "one");
                await this.AddSomeFilesAsync(tar, "two");
            }

            var tgzBytes = memory.ToArray();
            Assert.True(tgzBytes.Length > 2);
            Assert.Equal(0x1F, tgzBytes[0]);
            Assert.Equal(0x8B, tgzBytes[1]);
        }

        private async Task AddSomeFilesAsync(TarOutputStream tar, string name)
        {
            var encoding = new UTF8Encoding(false);

            tar.PutNextEntry(this.CreateEntry(name));

            for (int i = 0; i < 20; i++)
            {
                var text = $"Hello World {i}";
                var size = encoding.GetByteCount(text);
                tar.PutNextEntry(this.CreateEntry($"{name}/{i:D8}.txt", size));
                var writer = new StreamWriter(tar, encoding);
                await writer.WriteAsync(text);
                await writer.FlushAsync();
                tar.CloseEntry();

            }
        }

        private TarEntry CreateEntry(string name, int size = -1)
        {
            return new TarEntry(new TarHeader
            {
                LinkName = "",
                Name = name,
                Mode = size < 0 ? 1003 : 33216,
                TypeFlag = size < 0 ? TarHeader.LF_DIR : TarHeader.LF_NORMAL,
                Size = Math.Max(0, size),
                ModTime = DateTime.Now,
                DevMajor = 0,
                DevMinor = 0
            });
        }
    }
}

The text was updated successfully, but these errors were encountered:

Numpsy · 2019-11-01T13:12:49Z

same as #382 ?

SamuelWeibel · 2019-11-01T14:33:54Z

Looks like it.

I'm wondering if we have two issues:

The GZip issue mentioned in Can't untar result file #382 and
the problem that TarOutputStream flushes GZipOutputStream without any data (is this actually true?! - your reproduction on October 1st seems to indicate that's what is happening).

Numpsy · 2019-11-04T12:15:13Z

That's what seemed to be happening with TarArchive (there seem to be situations where disposing the archive causes it to flush the output stream that hasn't yet been written to, and to write all the data into the stream after that), not sure if using TarOutputStream directly is exactyl the same ror a variation.

piksel · 2019-12-30T23:47:00Z

This is probably because of the forced flush of the deflate engine. It's not supposed to work with an empty buffer. I think we need to rethink the DeflaterOutputStream flushing. What we are doing now is forcing the state machine to run even though it's not requesting data. The purpose was to make sure all buffers were flushed when the files were closed, but if it's not in a state that accepts more data, it's not doing anything useful anyway...

Numpsy · 2020-01-01T12:37:42Z

The way that calling flush on the DeflatorOutputStream seems to write data to the actual output even if there is no data to be deflated doesn't seem great, but making the subclassed streams pay more attention to their own flushing (e.g. #406/#390) still seems reasonable?

Numpsy · 2020-10-09T11:00:20Z

should be fixed in 1.3.0?

piksel · 2020-11-10T19:09:45Z

It should. Please create a new ticket if this is still a problem.

piksel mentioned this issue Jan 10, 2020

Extract archive less then 10Kb. #409

Closed

piksel closed this as completed Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broken GZip header since version 1.2 #393

Broken GZip header since version 1.2 #393

SamuelWeibel commented Nov 1, 2019 •

edited

Loading

Numpsy commented Nov 1, 2019

SamuelWeibel commented Nov 1, 2019

Numpsy commented Nov 4, 2019

piksel commented Dec 30, 2019

Numpsy commented Jan 1, 2020

Numpsy commented Oct 9, 2020

piksel commented Nov 10, 2020

Broken GZip header since version 1.2 #393

Broken GZip header since version 1.2 #393

Comments

SamuelWeibel commented Nov 1, 2019 • edited Loading

Steps to reproduce

Expected behavior

Actual behavior

Version of SharpZipLib

Obtained from

Test code

Numpsy commented Nov 1, 2019

SamuelWeibel commented Nov 1, 2019

Numpsy commented Nov 4, 2019

piksel commented Dec 30, 2019

Numpsy commented Jan 1, 2020

Numpsy commented Oct 9, 2020

piksel commented Nov 10, 2020

SamuelWeibel commented Nov 1, 2019 •

edited

Loading