-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in merged tars with big file names on Python 3.8 #204
Fix bug in merged tars with big file names on Python 3.8 #204
Conversation
When adding tarballs as input for other tarballs (i.e. `pkg_tar` that depends on another `pkg_tar`), every file with a name that exceeds 98 characters will not land in the new root directory (`package_dir` attribute), but rather on the actual root. Consider the following example: pkg_tar( name = "foo", package_dir = "/foo", strip_prefix = ".", srcs = [ ... ], deps = [":big-names"], ) pkg_tar( name = "big-names", strip_prefix = ".", srcs = [ "ninety-eight-characters-seems-to-be-the-maximum-limit-for-merged-tarballs-in-the-rules_pkg-for-bzl", "ninety-eight-characters-seems-to-be-the-maximum-limit-for-merged-tarballs-in-the-rules_pkg-for-bzl-", ], ) The foo tarball will have the following structure: . ├── foo │ └── ninety-eight-characters-seems-to-be-the-maximum-limit-for-merged-tarballs-in-the-rules_pkg-for-bzl └── ninety-eight-characters-seems-to-be-the-maximum-limit-for-merged-tarballs-in-the-rules_pkg-for-bzl- Filenames with 98 characters or less get included in the right directory, but the larger ones go to the root. A deep directory structure behaves the same way: . ├── foo │ └── ninety/eight/characters/seems/to/be/the/maximum/limit/for/merged/tarballs/in/the/rules_pkg/for/bzl └── ninety/eight/characters/seems/to/be/the/maximum/limit/for/merged/tarballs/in/the/rules_pkg/for/bzl- And the size of the package_dir has no influence: "/f", "/foo" or "/foo/bar/baz/quux" behaves exactly the same way.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Huh, pipeline passed. I get a broken test on ArchLinux, bazel 3.3.0- (@non-git). |
It's also worth mentioning that if I change the format to GNU_FORMAT, I don't get the error: diff --git a/pkg/archive.py b/pkg/archive.py
index f15b425..a2de674 100644
--- a/pkg/archive.py
+++ b/pkg/archive.py
@@ -146,7 +146,7 @@ class TarFileWriter(object):
# Instead, we manually re-implement gzopen from tarfile.py and set mtime.
self.fileobj = gzip.GzipFile(
filename=name, mode='w', compresslevel=9, mtime=self.default_mtime)
- self.tar = tarfile.open(name=name, mode=mode, fileobj=self.fileobj)
+ self.tar = tarfile.open(name=name, mode=mode, fileobj=self.fileobj, format=tarfile.GNU_FORMAT)
self.members = set([])
self.directories = set([]) |
OK, found the error. I'm running Python 3.8, and according to the docs: "Changed in version 3.8: The default format for new archives was changed to PAX_FORMAT from GNU_FORMAT". If I force |
Up until Python 3.8, the default format used to be GNU_FORMAT. On 3.8, it changed to PAX_FORMAT, which is supposed to also support big file names. There seems to be a bug in it, exposed by the test in my previous commit. This commit ensures that even with Python >= 3.8 long file names still work, by explicitly requesting GNU_FORMAT when writing.
See #203 |
@andrewalker I think you need to sign the CLA to get this merged. We're looking forward to it, as this oneliner makes the behaviour of rules_pkg not break horribly depending on your python version |
Hey, sorry I haven't signed the CLA yet. I was going to sign it as part of my company (so the corporate version), but that got a bit delayed in internal bureaucracy. I see #217 was merged, is there still interest in merging the test? |
I would like to have some tests to prove behavior. I am behind on this myself. I want to understand what appears contradictory
So, what I need is more tests beyond what you have in this CL. And, I don't have python3.8 in our test machines yet, so anything I do is a stab in the dark. Nor do I have a Windows machine in my home office, so some interoperbility there is hard to test. If you have no objections, I would take your test case and use it in a larger set of CLs. |
When adding tarballs as input for other tarballs (i.e.
pkg_tar
that depends on anotherpkg_tar
), every file with a name that exceeds 98 characters will not land in the new root directory (package_dir
attribute), but rather on the actual root.Consider the following example:
The foo tarball will have the following structure:
Filenames with 98 characters or less get included in the right directory, but the larger ones go to the root. A deep directory structure behaves the same way:
And the size of the package_dir has no influence: "/f", "/foo" or "/foo/bar/baz/quux" behaves exactly the same way.
This only happens on Python 3.8. The reason is the default format changed from
tarfile.GNU_FORMAT
totarfile.PAX_FORMAT
. It's also possible to reproduce this by changing the format in archive.py to PAX_FORMAT explicitly.To fix it, I'm explicitly requesting GNU_FORMAT even on 3.8, so that it continues to work as before.