Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid UnixFileMode value when using the TarFile API #77096

Closed
WeihanLi opened this issue Oct 16, 2022 · 19 comments · Fixed by #77151
Closed

Invalid UnixFileMode value when using the TarFile API #77096

WeihanLi opened this issue Oct 16, 2022 · 19 comments · Fixed by #77151

Comments

@WeihanLi
Copy link
Contributor

WeihanLi commented Oct 16, 2022

Description

I'm trying to exact a tar file into a folder, and the tar file may contains folders and files, when I try to use the TarFile.ExtractToDirectory API, I got the error like follows, maybe I'm wrong on using the API

Exception when invoke command System.ArgumentException: Invalid UnixFileMode value. (Parameter 'unixCreateMode')
   at System.IO.Directory.CreateDirectoryCore(String path, UnixFileMode unixCreateMode)
   at System.Formats.Tar.TarHelpers.CreateDirectory(String fullPath, Nullable`1 mode, SortedDictionary`2 pendingModes)
   at System.Formats.Tar.TarEntry.ExtractRelativeToDirectoryAsync(String destinationDirectoryPath, Boolean overwrite, SortedDictionary`2 pendingModes, CancellationToken cancellationToken)
   at System.Formats.Tar.TarFile.ExtractToDirectoryInternalAsync(Stream source, String destinationDirectoryPath, Boolean overwriteFiles, Boolean leaveOpen, CancellationToken cancellationToken)
   at System.Formats.Tar.TarFile.ExtractToDirectoryInternalAsync(Stream source, String destinationDirectoryPath, Boolean overwriteFiles, Boolean leaveOpen, CancellationToken cancellationToken)

Reproduction Steps

var tmpDirPath = Path.Combine(TmpPath, Guid.NewGuid().ToString("N"));
await var fs = File.OpenRead(@"xx.tar.gz");
await using var decompressStream = new GZipStream(fs, CompressionMode.Decompress);
await TarFile.ExtractToDirectoryAsync(decompressStream, tmpDirPath, true);

Expected behavior

Work as normal

Actual behavior

Exception as before

Regression?

No response

Known Workarounds

No response

Configuration

.NET 7 RC2
CentOS 7

Other information

Works well on Windows but not on CentOS

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 16, 2022
@ghost
Copy link

ghost commented Oct 16, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I'm trying to exact a tar file into a folder, and the tar file may contains folders and files, when I try to use the TarFile.ExtractToDirectory API, I got the error like follows, maybe I'm wrong on using the API

Exception when invoke command System.ArgumentException: Invalid UnixFileMode value. (Parameter 'unixCreateMode')
   at System.IO.Directory.CreateDirectoryCore(String path, UnixFileMode unixCreateMode)
   at System.Formats.Tar.TarHelpers.CreateDirectory(String fullPath, Nullable`1 mode, SortedDictionary`2 pendingModes)
   at System.Formats.Tar.TarEntry.ExtractRelativeToDirectoryAsync(String destinationDirectoryPath, Boolean overwrite, SortedDictionary`2 pendingModes, CancellationToken cancellationToken)
   at System.Formats.Tar.TarFile.ExtractToDirectoryInternalAsync(Stream source, String destinationDirectoryPath, Boolean overwriteFiles, Boolean leaveOpen, CancellationToken cancellationToken)
   at System.Formats.Tar.TarFile.ExtractToDirectoryInternalAsync(Stream source, String destinationDirectoryPath, Boolean overwriteFiles, Boolean leaveOpen, CancellationToken cancellationToken)

Reproduction Steps

var tmpDirPath = Path.Combine(TmpPath, Guid.NewGuid().ToString("N"));
await var fs = File.OpenRead(@"xx.tar.gz");
await using var decompressStream = new GZipStream(fs, CompressionMode.Decompress);
await TarFile.ExtractToDirectoryAsync(decompressStream, tmpDirPath, true);

Expected behavior

Work as normal

Actual behavior

Exception as before

Regression?

No response

Known Workarounds

No response

Configuration

.NET 7 RC2
CentOS 7

Other information

No response

Author: WeihanLi
Assignees: -
Labels:

area-System.IO

Milestone: -

@danmoseley
Copy link
Member

Is it possible to share a tar file that reproduces this?

@WeihanLi
Copy link
Contributor Author

data.tar.gz

Is it possible to share a tar file that reproduces this?

Sure, attached, thanks for the help

@danmoseley
Copy link
Member

Do you want to attach a debugger to find out what the unexpected UnixFileMode value is? Perhaps ideally it would be in the message.

@WeihanLi
Copy link
Contributor Author

Sorry that it's a little hard for me to debug it, I had only Windows to debug, and it works well on Windows. And from the call stack, the unexpected UnixFileMode value may come from the inside since I'm not passing any UnixFileModel value

@danmoseley
Copy link
Member

No problem, with the repro file they can easily do it.

@am11
Copy link
Member

am11 commented Oct 17, 2022

Here, the directory mode is 16877, but we should just discard the top four bits.

This patch fixes the issue (and passes all the existing tests):

--- a/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarEntry.cs
+++ b/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarEntry.cs

         /// <remarks>The value in this field has no effect on Windows platforms.</remarks>
         public UnixFileMode Mode
         {
-            get => (UnixFileMode)_header._mode;
+            get => (UnixFileMode)(_header._mode  & 0xFFF); // mask to keep lower 12 bits and clear out the rest
             set
             {
-                if ((int)value is < 0 or > 4095) // 4095 in decimal is 7777 in octal
+                if ((int)value is < 0 or > 0xFFF) // 0xFFF (4095 in decimal) is 7777 in octal
                 {

@adamsitnik
Copy link
Member

@am11 would you like to send a PR?

@am11
Copy link
Member

am11 commented Oct 17, 2022

@adamsitnik, we are using the same mask in the writer:

entry._header._mode = status.Mode & 4095; // First 12 bits
(term First maybe confusing, but it is the least significant 12 bits)

It seems like other tools are writing mode with higher values, but we are constraining it. So I am not sure if we want to revisit it broadly or conservatively fix this bug. @carlossanlop and/or @tmds might also have opinions on this.

@tmds
Copy link
Member

tmds commented Oct 17, 2022

The patch for UnixFileMode property looks good to me.

It seems like other tools are writing mode with higher values, but we are constraining it.

Probably the spec says this should store st_mode which is UnixFileMode + file type (similar to zip ExternalAttributes).
We should update the TarWriter so it matches the spec.

@am11
Copy link
Member

am11 commented Oct 18, 2022

@WeihanLi could you provide more info on how did you create this tar.gz file? Please create a smaller one, with one or two directories and a hello world file; max size < 100 kb and if possible, send it via a PR to https://github.com/dotnet/runtime-assets.

I extracted your .tar.gz using command line tar -xzf data.tar.gz and recreated a new tar.gz from the extracted content, then used your code to extract the new tar.gz file. It worked with both GNU as well as BSD flavors. There are number of existing tests extracting such compressed/uncompressed tarballs with directory structure exercising this code.

Also, if we remove GZipStream call from your code and extract the .tar from .tar.gz using gzip -d data.tar.gz, then use your code to extract uncompressed .tar with plain FileStream, it works without throwing any exceptions. The underlying issue seems to be related to interaction between gzip stream and tar reader. @jozkee does it ring a bell?

@jozkee
Copy link
Member

jozkee commented Oct 18, 2022

The underlying issue seems to be related to interaction between gzip stream and tar reader.

The code splits in some places based on the CanSeekability of a Stream. We recently fixed #74242, that was included on RC2.

@am11
Copy link
Member

am11 commented Oct 18, 2022

I am testing with main branch. :)

@tmds
Copy link
Member

tmds commented Oct 18, 2022

which is UnixFileMode + file type

16877 is 040755.
755 is UnixFileMode, and the 4 is from S_IFDIR = 040000.

how did you create this tar.gz file?

Good question.

None of the other tools we're using seem to include the file type bits.

The underlying issue seems to be related to interaction between gzip stream and tar reader.

I don't see this. I get the same exception when doing

using var fs = File.OpenRead("data.tar");
await TarFile.ExtractToDirectoryAsync(fs, tmpDirPath, true);

@WeihanLi
Copy link
Contributor Author

could you provide more info on how did you create this tar.gz file?

Hi @am11 , I got this file from the docker image layer, I'm trying to pull the docker image with the docker registry API
https://docs.docker.com/registry/spec/api/#pulling-a-layer
and here's my code to pull the docker image: https://github.com/WeihanLi/nocker/blob/7d8b6ead7e356e483358f4785e6bb182d43d2e2b/Nocker.cs#L85

@am11
Copy link
Member

am11 commented Oct 18, 2022

I don't see this. I get the same exception when doing

My bad. I had a modification in Directory.Unix.cs, which was skipping the exception. Yes it is there on the main without GZipStream.

I'm trying to pull the docker image with the docker registry API

Thanks. Perhaps it is archive/tar golang package which generates such a tarball? I was able to reproduce it with a 20K image:

$ docker pull hello-world
$ docker save hello-world -o hello-world.tar # uncompressed when saving a local image

# extract with your code without GZipStream step

I'll open a PR in runtime-assets repo first so we can add test with the fix in this repo.

@tmds
Copy link
Member

tmds commented Oct 18, 2022

Perhaps it is archive/tar golang package which generates such a tarball?

Can you create an issue in their bugtracker?

Since .NET writer filters out those bits, it seems we think the proper thing is to not include them.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 18, 2022
@am11 am11 removed the untriaged New issue has not been triaged by the area owner label Oct 18, 2022
@am11
Copy link
Member

am11 commented Oct 18, 2022

Since .NET writer filters out those bits, it seems we think the proper thing is to not include them.

TarReader can be permissive since GNU and BSD tar(1) handle this case well, but they do not include file type bits in the header when creating the tar.

Can you create an issue in their bugtracker?

I am not sure if it belongs to producer of the package (https://github.com/golang/go) or its consumer (https://github.com/moby/moby).

@tmds
Copy link
Member

tmds commented Oct 19, 2022

I am not sure if it belongs to producer of the package (https://github.com/golang/go) or its consumer (https://github.com/moby/moby).

The issue will be fixed in the next version of Docker: moby/moby#44083.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 20, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants