-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate file paths not accepted by pkg_tar #849
Comments
Duplicate path entries are made possible within tar archives, as per feature request described in bazelbuild#849. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives, as per feature request described in bazelbuild#849. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives, as per feature request described in bazelbuild#849. RELNOTES: Duplicate path entry support within tar archives
Compatibliy with gnutar is not a goal we have been aiming for. We've done a lot of work to explicitly detect duplicate files because they are generally a sign that you didn't lay out the inputs correctly. Can you provide a more concrete example of what you are trying to with laying down multiple tar balls? |
What I am trying to do is combine two tar files, where one of them is regarded as having overwrite priority for all path conflicts. Before going into an example, my understanding of I will try to illustrate the problem here. For some diversity in tar utilities, I also exemplify Example "conflicting" input archives
Concatenate with gnutar
Propagate concatenated archive through pkg_tarBUILD
Commands
Compare unpacked results of gnutar and pkg_tar
When unpacking the archive created by bsdtar concatenate and unpack behaviorWhile
|
Thanks for those examples. How about this proposed API Add
With that behavior, all existing tests should pass without change. A new test could easily be added. A possible future idea is an option to allow FWIW.... this is the second time in a month where no-sort-list has come up in a context like this. buildozer needs fixing to shift that declaration to the rule author, not the end user. But this is beyond the scope here. |
Yes that sounds like a good proposal, as it supports this new behavior and doesn't risk breaking the existing API. I don't know if it's a sensible use case for duplicates to be assigned to FWIW, this behavior originates from the tape drive technology and specification, e.g.: tape archive duplicates. |
SGTM. Except my proposal about testing is not sufficient. You need to be able to specify that a file appears N times, and possibly what the content for each is.
Well.... if we want to get pedantic, I actually used tape drives before tar - and the pain of dealing with that never leave you. In those times, you usually had no file name at all on tapes. It was just data. More advanced schemes used a label at the beginning, specifying the name and sequence number (so you knew if it was tape 3 of a very large data set). The duplicate file name within the same physical unit was pretty much tar specific because most other backup schemes would rarely append into the existing set of files. Instead, they would skip to end of the written tape, and append a new set of data. That resulted in something like file1 EOF file2 EOF ... fileN EOF EOF fileX EOF, fileY EOF ... EOF EOF EOF. tar improved tape utilization by changing any single set of file EOF groups to a single file,file,file EOF group. You could put multiple tar outputs on a tape, but trying to back up over that last EOF and append into the previous one was a bit risky. I like to remind people that MapReduce/Flume/Hadoops/Spark ... are not a recent invention. People dealt with data bigger than memory using tapes drives to read/process/sort/merge back in the 1950s. |
Duplicate path entries are made possible within tar archives as discussed in feature request bazelbuild#849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives as discussed in feature request bazelbuild#849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives as discussed in feature request bazelbuild#849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives as discussed in feature request bazelbuild#849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives as discussed in feature request bazelbuild#849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives as discussed in feature request bazelbuild#849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Duplicate path entries are made possible within tar archives as discussed in feature request #849. This includes an interaction with create parents, where the only logical scenario which would require inference of a parent directory is when one does not already exist. This is because allowance of duplicates is only useful when explicit paths are declared. RELNOTES: Duplicate path entries supported within tar archives
Believed fixed by #850 |
When tar archives which contain duplicate file paths are assigned to deps of
pkg_tar
, the function removes all but the first occurring instance, and warns with "Duplicate file in archive picking first occurrence". This contrasts with the common GNU tar utility, where "tar allows you to have infinite number of files with the same name" according to document for--append
under The Five Advanced tar Operations, where unpacking of the archive will yield the last occurring instance of the file path.My use case involves doing tar archive augmentation through a series of build steps, where one of these operations is to overwrite files of an existing archive. It is not an option to unpack the archives, one over the other, and then repack the result, because new entries for parent directories of the files in the incoming archives will be created which state their permissions. This is because the state of the directories on the extraction target must be preserved, rather than set to those which were present in the environment where the augmentation took place. So if an archive with duplicate files is presented to
pkg_tar
where the one most recently added is meant to overwrite, it is actually the original, or first file, which is retained, and thus when unpacking the overwriting behavior is not achieved.I feel that pkg_tar's behavior in this case should be compatible with GNU tar, in order to support interoperability and leverage from common understanding that it has probably established.
The text was updated successfully, but these errors were encountered: