Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken GZip header since version 1.2 #393

Closed
SamuelWeibel opened this issue Nov 1, 2019 · 7 comments
Closed

Broken GZip header since version 1.2 #393

SamuelWeibel opened this issue Nov 1, 2019 · 7 comments

Comments

@SamuelWeibel
Copy link

SamuelWeibel commented Nov 1, 2019

Steps to reproduce

  1. Create a .tar.gz file using the test code provided at the bottom of this issue.
  2. Running with SharpZipLib 1.1.0, the file starts (as expected) with the magic header 0x1f, 0x8b
  3. Running with SharpZipLib 1.2.0, the file contains additional bytes before the magic header 0x1f, 0x8b

Expected behavior

The GZip file should always start with the magic header 0x1f, 0x8b.
Otherwise it can't be opened with most gz compatible applications.

Actual behavior

The GZip file starts with the following bytes (up to and including the magic header):

02 08 20 80 00 02 08 20
80 00 02 08 20 80 00 02
08 20 80 00 02 08 20 80
00 02 08 20 80 00 1F 8B

If you strip away the "garbage" (first 30 bytes), the file seems to be correct.

Version of SharpZipLib

1.2.0
Running under .NET Core 3.0 (final)

Obtained from

  • Package installed using NuGet

Test code

Remark: this is stripped down production code, so please don't judge it ;-)

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;
using Xunit;

namespace ReproSharpZipLib
{
    public class TestTarGzip
    {
        [Fact]
        public async Task Compress()
        {
            var memory = new MemoryStream();
            await using (var gzip = new GZipOutputStream(memory) { IsStreamOwner = false })
            await using (var tar = new TarOutputStream(gzip))
            {
                var bytes = Encoding.UTF8.GetBytes("This is the first file");
                tar.PutNextEntry(this.CreateEntry("first.txt", bytes.Length));
                tar.Write(bytes);
                tar.CloseEntry();

                await this.AddSomeFilesAsync(tar, "one");
                await this.AddSomeFilesAsync(tar, "two");
            }

            var tgzBytes = memory.ToArray();
            Assert.True(tgzBytes.Length > 2);
            Assert.Equal(0x1F, tgzBytes[0]);
            Assert.Equal(0x8B, tgzBytes[1]);
        }

        private async Task AddSomeFilesAsync(TarOutputStream tar, string name)
        {
            var encoding = new UTF8Encoding(false);

            tar.PutNextEntry(this.CreateEntry(name));

            for (int i = 0; i < 20; i++)
            {
                var text = $"Hello World {i}";
                var size = encoding.GetByteCount(text);
                tar.PutNextEntry(this.CreateEntry($"{name}/{i:D8}.txt", size));
                var writer = new StreamWriter(tar, encoding);
                await writer.WriteAsync(text);
                await writer.FlushAsync();
                tar.CloseEntry();

            }
        }

        private TarEntry CreateEntry(string name, int size = -1)
        {
            return new TarEntry(new TarHeader
            {
                LinkName = "",
                Name = name,
                Mode = size < 0 ? 1003 : 33216,
                TypeFlag = size < 0 ? TarHeader.LF_DIR : TarHeader.LF_NORMAL,
                Size = Math.Max(0, size),
                ModTime = DateTime.Now,
                DevMajor = 0,
                DevMinor = 0
            });
        }
    }
}
@Numpsy
Copy link
Contributor

Numpsy commented Nov 1, 2019

same as #382 ?

@SamuelWeibel
Copy link
Author

Looks like it.

I'm wondering if we have two issues:

  1. The GZip issue mentioned in Can't untar result file #382 and
  2. the problem that TarOutputStream flushes GZipOutputStream without any data (is this actually true?! - your reproduction on October 1st seems to indicate that's what is happening).

@Numpsy
Copy link
Contributor

Numpsy commented Nov 4, 2019

That's what seemed to be happening with TarArchive (there seem to be situations where disposing the archive causes it to flush the output stream that hasn't yet been written to, and to write all the data into the stream after that), not sure if using TarOutputStream directly is exactyl the same ror a variation.

@piksel
Copy link
Member

piksel commented Dec 30, 2019

This is probably because of the forced flush of the deflate engine. It's not supposed to work with an empty buffer. I think we need to rethink the DeflaterOutputStream flushing. What we are doing now is forcing the state machine to run even though it's not requesting data. The purpose was to make sure all buffers were flushed when the files were closed, but if it's not in a state that accepts more data, it's not doing anything useful anyway...

@Numpsy
Copy link
Contributor

Numpsy commented Jan 1, 2020

The way that calling flush on the DeflatorOutputStream seems to write data to the actual output even if there is no data to be deflated doesn't seem great, but making the subclassed streams pay more attention to their own flushing (e.g. #406/#390) still seems reasonable?

@Numpsy
Copy link
Contributor

Numpsy commented Oct 9, 2020

should be fixed in 1.3.0?

@piksel
Copy link
Member

piksel commented Nov 10, 2020

It should. Please create a new ticket if this is still a problem.

@piksel piksel closed this as completed Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants