-
-
Notifications
You must be signed in to change notification settings - Fork 390
Add support for storing symlinks in tar and zip archives #92
Conversation
Nice! Looking forward to going through this, unless another collaborator beats me to it. |
Adding my notes from testing using the archiver cli Windows to Windows, Zip format -> Symlinks are not preserved
Linux to Linux, Zip format -> Symlinks are preserved (retains relative path) |
Thanks for your comprehensive tests, @johnarok! I'm travelling right now and only have access to a Macbook for the next 2 weeks, so can't do any Windows testing myself. I assume some of the failures are due to not normalizing the symlink paths to always use forward slashes for the path separator. I added a commit that calls If you have the time, could you redo your tests with the updated PR? I'm somewhat surprised that the appveyor tests didn't see the windows-to-windows failures you were seeing; did the tests pass for you on Windows? That would mean that the tests are not comprehensive enough. The only explanation I could see is if git didn't re-create the symlinks in the testdata directory. |
Hi! @jandubois , so the windows tests indeed pass, the reason being a git clone on windows ( I retested by manually creating the symbolic link for the
Below is what I observed this time Windows to Windows, Zip format -> Symlinks are not preserved (they appear as normal files)
Windows to Windows, Tar format -> Symlinks are preserved (breaks symlink with incorrect path)
Windows to Linux, Tar format -> Symlinks are preserved (breaks symlink with incorrect path)
Linux to Linux, Zip format -> Symlinks are preserved (retains relative path) |
Hi @johnarok! I think I will need access to my Windows VMs to investigate this further, so there won't be any further activity from me for the next 7-10 days. I'll get back to you once I actually have run this on Windows myself. Just for clarification, what are the expectations for older Windows versions that don't have proper symlink support? Or for versions that require admin permissions to create symlinks when the tests are being run as a normal user? |
Hi @jandubois, I did some cursory research on the state of symlinks for windows in other projects, links here. What are the expectations for older Windows versions that don't have proper symlink support? Or for versions that require admin permissions to create symlinks when the tests are being run as a normal user? Also, retested it with 1.11.2 go version and the results did not change much. I have uploaded the artifacts here for reference. |
:-/ Well this complicated, isn't it. Thanks for the work, both of you! I'd like to see this get to a state where it can be merged, but my PR #99 will have to come first, and it does not address the symlink issues, which means that this PR will have to be rebased -- fortunately, I think this PR should be easier to rebase. FWIW the new code in #99 will allow the user to decide whether they want to continue on errors, so it's OK if there are errors, it doesn't have to stop the operation. As for older versions of Windows -- I say drop support. I'm not thrilled with supporting old OSes. Especially when it makes things simpler. So, I might be a little hands-off on this PR -- but don't let me slow you down. Go ahead and both of you do what you think is best and let's merge it in after my rewrite is finalized. In the meantime, you might want to rebase this fork off my |
Okay, the rewrite is done! If you want to update this PR, go for it. |
Also implement extraction of symlinks from zip archives.
@mholt I've updated this PR on top of latest |
Sounds good. Keep us posted! :) |
@johnarok I cannot reproduce your test failures. I'm also somewhat confused that your re-testing with the updated PR still showed a symlink with backslashes in them; that should have been fixed by that time. Anyways, with the current state of the PR I used a Windows 10 x64 VM with the latest Windows updates, and I've enabled "Developer Mode" to be able to create symlinks without getting UAC prompts. I installed these 2 packages to get https://github.com/git-for-windows/git/releases/download/v2.19.1.windows.1/Git-2.19.1-64-bit.exe I enabled
All tests pass (I've temporarily patched
I created 2 test archives:
Both archives could be unpacked with I could also unpack them on OS X with What did not work was unpacking them on Windows using the corresponding MinGW tools. I assume they don't support symlinks, or at least not by default. I believe their limitation is irrelevant to this PR. I've also created the same tarball and zip files using To me this means that the code is supporting symlinks on Windows as intended. @johnarok Please provide more detailed instructions to reproduce your failures, if you believe they still need to be addressed! The only open question in my mind is if the drive specified should be stripped from a full path symlink, i.e. store Unrelated observation: When using
This does not happen with the zip file created on OS X, and doesn't happen with either zip file when using
I haven't looked further into it, as it is completely unrelated to the symlink code. It should probably get its own Github issue, but maybe you can double-check this when verifying this latest PR version? |
Out of curiosity I just disabled "Developer Mode" and ran the tests again:
That seems fine to me as well... |
@jandubois, thank you for validating, I did not use the developer mode. I will pull the latest and test again to confirm - hopefully next week after the holidays. |
Something to consider: should how symlinks are handled be configurable? It's possible now. Maybe some users want to preserve them, while others want to follow/dereference them... |
@jandubois I have to admit, testing with the latest pull is working perfect for all combinations both with
On preferences, I prefer default to be the current behavior, fail when proper access is not available to create symlinks. |
@johnarok I've added the appveyor config you requested. Tests are still passing, but I can't tell if appveyor actually does support symlinks or not (they may be running a version of git that ignores that option). My thoughts on making symlink support configurable:
I agree with @johnarok that the current behavior implemented by this PR is the desirable default. Only concern is that previously creating ZIP archives would have followed symlinks (I think, I haven't tested), and maintaining backwards compatibility is a good thing. For TAR archives the code before this PR used to store the full path to the original file as a symlink, which is totally broken and not worth preserving. My preferred way forward would be to switch the version to 4.0 and change the default mode for ZIP archive creation. Adding an option for following symlinks is independent functionality and should be requested and/or implemented in a separate issue/pr. |
@jandubois I think I agree with almost all of what you've proposed! The
As you can see there, the only references to those functions are in writing In short, I'd love to see this package brought up to speed with a standard/consistent way of handling symlinks (and hard links maybe? doesn't have to be in this PR) for Zip and Tar types (and extracting them from Rar, I suppose). So far, I think we've decided:
Not too concerned about backwards compatibility. I think we pretty much broke symlink support with v3, so I don't think that skipping or adding it would be breaking either way. We might be able to keep it in the v3 tree. So, with that said, how should we proceed on this PR? What else needs to be done? |
This PR uses I'm not sure if preserving hard links is an important goal. It is not supported by ZIP at all, afaik. Even with tarballs it is a problematic feature, e.g. you cannot selectively extract a later hardlink unless you have extracted the file under the first hardlink name already. Hardlink support may also be platform specific; I don't know if there is module that provides a device/inode view of Windows filesystems. So to me it feels somewhat at odds with the project value of
Either way, supporting hardlinks would be completely different from softlinks, so should be a separate issue. Personally I feel it is not worth the effort.
If you are fine with making the current behavior the default, then I think this PR should be ready for merging, and a new Github issue should be opened to add an option to dereference symbolic links while creating archives. I'm personally not interested in that feature, so probably wouldn't want to implement it: the change itself is rather straight-forward, but I'm not sure how you would want to re-arrange the tests to accommodate testing different option settings. Once the test framework for this is in place, I would be happy to add the conditionals to the actual code itself though. 😄 |
Excellent. You're right, hard links are probably not a priority at all, and maybe should even be removed entirely from the code. As for this PR currently, I will begin a review now. Just to be sure I understand clearly going into this: this PR archives symlinks as symlinks, and does not follow them. (Right?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple initial questions from my high-level pass through the code. Tests look good. Most things look good, I just have a couple questions about code organization and understanding symlink support in zip files.
zip.go
Outdated
@@ -186,6 +186,15 @@ func (z *Zip) extractNext(to string) error { | |||
if !ok { | |||
return fmt.Errorf("expected header to be zip.FileHeader but was %T", f.Header) | |||
} | |||
if (header.FileInfo().Mode() & os.ModeSymlink) != 0 { | |||
buffer := make([]byte, header.FileInfo().Size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be very different from how it works in tar archives; while I know that this PR writes the target filepath as the symlink's contents, is it safe to assume that every zip archiver will do that? Is there risk here of reading in a potentially huge file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure; I simply went by the symlink code in archive/zip test: https://github.com/golang/go/blob/b0a53d2/src/archive/zip/writer_test.go#L54-L59
I believe this is what Info-ZIP refers to as the "generic symlink support" in the later versions. Earlier versions (and PKZIP) had platform specific extensions for symlinks (so symlinks archived on one platform could not be extracted on another one), but I think those have been abandoned.
I'm not sure if this is necessary, but I can add a check for some reasonable maximum path length, if you prefer. If you want that, what should the limit be? 2^16
seems to be way beyond what I would expect to see for real pathnames, but small enough not to cause memory issues...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose this is OK then (no need for size check at this time), but I would probably want to move this logic to the z.extractFile
method, if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put the logic into extractNext
because it already had done the type assertion on header
, but you are right, it must be in extractFile
, otherwise Extract
will not be able to extract symlinks. I'll do this (and duplicate the header
assertion into extractFile
).
Unfortunately this points out that test coverage of the Extract
method is lacking. 😦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think we can move one chunk of code into another function for a little better organization.
zip.go
Outdated
@@ -186,6 +186,15 @@ func (z *Zip) extractNext(to string) error { | |||
if !ok { | |||
return fmt.Errorf("expected header to be zip.FileHeader but was %T", f.Header) | |||
} | |||
if (header.FileInfo().Mode() & os.ModeSymlink) != 0 { | |||
buffer := make([]byte, header.FileInfo().Size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose this is OK then (no need for size check at this time), but I would probably want to move this logic to the z.extractFile
method, if possible.
This is necessary so that calls from Extract() to extractFile() will handle symlinks properly as well.
tar.go
Outdated
@@ -338,7 +340,16 @@ func (t *Tar) Write(f File) error { | |||
return fmt.Errorf("missing file name") | |||
} | |||
|
|||
hdr, err := tar.FileInfoHeader(f, f.Name()) | |||
filename := f.Name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, previous commits in this change had left the filename
(previously called linkTarget
) empty for non-symlinks, which I think was probably a bug. Should be fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was intentional, the value is only used for symlinks (which is why I renamed it to linkTarget
to make it more obvious), so the empty string in the non-symlink case would be appropriate: https://github.com/golang/go/blob/159797a/src/archive/tar/common.go#L646-L648
So I would prefer not to set it to a value that doesn't make any sense (this was the original bug, that the link was set to the source filename and not the link target). It is just confusing when you read the source and wonder why the field is set to this value.
But as I said, the value is unused for non-symlinks, so it doesn't really matter at runtime, but I would prefer to undo this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Pushed a commit that fixes that. Thanks. Am ready to merge if you are!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am ready to merge if you are!
Sure, go ahead and let's close a bunch of open issues! 😄
I assume you will take care of filing issues for things discovered in this PR, if you feel they should be addressed at some point:
-
Add option to dereference symlinks before adding to archive
-
Add test coverage for
Extract
method -
(maybe) Remove vestigial hardlink support
-
(maybe) Investigate weird permissions issue with ZIP files created on Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think those are all suitable for other issues and PRs. Thanks again!
Thanks for the work! I took the liberty of finishing up a few minor changes. Tests are passing; let me know what you think, and if it works for you on your symlink-containing archives. |
Also implement extraction of symlinks from zip archives.
This PR also adds a relative symlink in the testdata directory. It passes all tests on OS X, but has not been tested on Windows.
This PR is a superset of changes from the following issues and PRs:
Fixes #21
Fixes #31
Fixes #60
Fixes #74