-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive/zip: Zip reading with prefixed bytes #10464
Comments
I have a potential patch pushed here, I can try to submit it via gerrit if it looks like a sensible approach. |
Wasn't this already fixed when we added http://tip.golang.org/pkg/archive/zip/#Writer.SetOffset ? |
@bradfitz Looking at that issue I think it's specifically about writing files, this one would be for reading files. I added a testcase that appeared to fail before starting. It's also worth considering that in this particular case, the size of the prefixed bytes isn't known until you read the zip central directory record. This means that an equivalent |
Also, this is my first experience writing go, so apologies if the suggested changes aren't using the proper conventions. I'm happy to try and rework things before submitting to gerrit if there is something off about the suggested changes. |
You didn't mention in your bug report that this is about reading. Can you write a self-contained bug report here without referencing other URLs, discussions, or patches? I see no reason why reading such a file would fail. The zip TOC is at the end, so it doesn't matter what's at the beginning of a zip file if the TOC is correct. |
Apologies, I've updated the title. I thought the references would be useful. Although the TOC is located correctly and can be read, the offsets that are stored in the directory structure are assumed to be relative to the front of the file. This means that when you actually try to read the data you're hitting the incorrect part of the underling file since there are additional bytes at the font that aren't considered. The suggested fix works out the size of these additional bytes and then includes them when reading directory entries and file content. |
So it's an invalid zip file? Is there a spec for this? Is there some bit in the TOC that says this adjustment should be done? |
I'm guessing that the spec is a little open to interpretation here. Going from https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT descriptions such as:
leave things a little ambiguous as to what the offset is "relative" to. I do know that such files can be read by |
Here is the output from
Note specifically the |
@bradfitz Is there anything else I can do for this one or do you consider prefixed bytes to be a spec violation? I'm happy to revisit the code if you want changes. |
I'm curious which recovery strategies other tools use. I don't want to implement a unique one that doesn't agree with the leniency of other tools, in either direction. |
I've tried to understand what the stock The Java implementation also delegates to |
I imagine on error (when we missed finding the directory header) we'd just scan forward looking for the directory header (0x02014b50) and for each case of 0x02014b50 we find, remember its offset from the real one, then load all the file headers with that offset, stopping once we find an offset that works. But this is too late for Go 1.5. The tree closes in 1.5 days and the few people who regularly hack on this package are on vacation or busy. For now I recommend you just do this loop yourself. You could instead scan forwards from the beginning, looking for file headers (0x04034b50) and whenever you find a 0x04034b50, try chopping off that prefix and opening the zip anew. |
Same story here. I heavily relied go could do this upon quick-checking that TOC -reading code is OK. |
I'm running into this problem I think with a program that is scanning files to search for certain data, when it encounters a JAR file it is processed as a zip file. Now checking the logs I'm noticing that when the program has been stopping for unknown reasons in most cases its scanning JAR files, and specifically certain rt.jar file that is a part of JRE. Most of the time the program can read the rt.jar file but not always, still looking into why that is. The program has run on over 1000 machines and by far the most common file it stops on are JAR files. One example of this happening just sometimes, when processing rt.jar that comes with the TSM BA client for the same version. Two machines have the same versions, one successfully scans the file while the other fails. |
Same issue here. As suggested, the workaround is to scan for every instances of 0x02014b50 and try to create a reader from that offset until it works. Reading from the end helps if the file is big. At least one implemenation minizip simply computes the start of the archive in the file from the end descriptor instead of assuming it's 0 which looks like the sensible thing to do. |
Implementations differ in this respect:
I haven't seen any implementations do the scanning idea. They either look in one place or two. Scanning seems like a bad idea because file data or even unused space between files can contain a central directory header signature. |
Just FYI, zip file with prefixed bytes can be adjusted with |
Change https://go.dev/cl/387976 mentions this issue: |
As discussed here it is possible for Java jar files (zip formatted archives) to include a prefixed bash script that allows them to act like fully executable applications. Many existing tools (
java -jar
,unzip
etc) support zip files with additional prefixed bytes.Unfortunately
go
currently doesn't support the reading of such archives which causes problems for the Cloud Foundry cli tool.The text was updated successfully, but these errors were encountered: