-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid comment length #48
Comments
Can you give me the zipfile (or a zipfile) that reproduces the problem? |
Sure, but I'd rather not share the zip publicly. Would you mind sending me an email and I'll send it to you? (My email is on my GitHub page.) |
Just sent to you via email. It was generated with some .NET library. I can get more details if helpful. |
The file you sent me looks like it got an html documented concatenated to the end of it. I'm not sure what the html page is, but it looks like it's got an option to download a zipfile in it. I isolated the html document and emailed it to you. I'm not sure if your zipfile creator did that on purpose or if it's a bug, but including an html page at the end of a zipfile certainly seems strange to me, especially when the document makes img and script references to external sources. I suspect your http server erroneously concatenated some html content at the end of a zipfile download. So why does yauzl reject the file when others accept it? Here's a tldr: the zipfile is malformed, and yauzl is more strictly standards compliant than most/all other zipfile readers. For this particular problem, yauzl is being very picky in an attempt to avoid a specific problem that arises from a design flaw in the zipfile spec. The following is a technical description of exactly what's wrong with the zipfile, a justification for yauzl's handling of the situation, and why another zipfile reader (Info-ZIP's The .zip file specification is flawedThe high-level structure of a .zip file dictates that a reader must first look for the End of Central Directory Record, which is located at the very end of a .zip file. The final field of the End of Central Directory Record is a variable-length comment. The length of the comment is recorded in a field in the End of Central Directory Record before the comment itself. Here's a diagram to emphasize how flawed this design is:
We can't find the magic number from the beginning, because it could be any distance into the zipfile from the beginning. We can't find the magic number from the end, because it could be any distance into the zipfile from the end (up to 32KB). The defining characteristic of a zipfile is the magic number in the End of Central Directory Record, which is located in the middle of the file. The only way to find the End of Central Directory Record is to do a linear search backwards from the end of the file, but even that is not guaranteed to find it. This is because the comment itself can be anything; it can be any bytes; it can even contain the magic number we're looking for. This means that literally the only way to find the End of Central Directory Record is to use heuristics to guess where the zipfile creator meant for it to be. The .zip file specification is ambiguous; it is flawed. A simple fix to the spec would have been to forbid the magic number from appearing in the comment (or to move the comment to before the End of Central Directory Record, or to remove the comment entirely). yauzl's behavioryauzl searches backwards for the magic number, and once it is found, yauzl does some additional checking to make sure this magic number is actually part of an End of Central Directory Record. This check involves sanity checking all the fields in the End of Central Directory Record, including verifying that the comment length field is correct. If any of the fields look fishy, the zipfile is rejected. Info-ZIP's
|
Wow. Thanks so much for the very thorough analysis and explanation. I'm sure the extra data at the end of the zip is due to some bug from the zip generation code the creator uses. I'll push on them to fix their bug but I'm not very confident they will fix it quickly... |
The maximum comment length should be 65535 (64K-1 or 0xffff) because the field is unsigned. |
Oh right. That's how the code is written; I forgot it was unsigned in the above comment. Thanks for the correction. EDIT: Actually, I guess my code is off by 1. Oops. So I'm reading 1 more byte than possibly necessary in the first read, and some zipfiles will be rejected with a different error message than you might expect. I'll fix that in the next release. |
I confirmed that the creator of the zip does indeed have a bug in their code. It will take them time to fix it so for now I'm just ignoring this comment length error. Thanks for the help! |
I'm having trouble unzipping a zip file uploaded by a user. The zip opens fine in any other unzip software I've tried. The error I'm getting is:
If I comment out line 125 of index.js where the error is thrown, the file does seem to unzip properly. Any thoughts?
The text was updated successfully, but these errors were encountered: