Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ace cannot unzip some EPUB #121

Closed
laudrain opened this issue Dec 7, 2017 · 8 comments · Fixed by #138
Closed

Ace cannot unzip some EPUB #121

laudrain opened this issue Dec 7, 2017 · 8 comments · Fixed by #138
Assignees
Labels

Comments

@laudrain
Copy link

laudrain commented Dec 7, 2017

Ace crashes when processing some EPUBs with following messages:

HL018125:Accessibilité laudrain$ ./ace-check.sh NouvellesOrdinaires3décembre1678
verbose: ACE cwd=/Users/laudrain/Documents/Fichiers/eBook/Accessibilité, outdir=NouvellesOrdinaires3décembre1678-ace, tmpdir=NouvellesOrdinaires3décembre1678-ace, verbose=true, silent=false, jobId=
info: Processing NouvellesOrdinaires3décembre1678.epub
verbose: Extracting EPUB
error: Failed to unzip EPUB
error: Unexpected error: invalid comment length. expected: 7. found: 0
debug: Error: invalid comment length. expected: 7. found: 0
at /usr/local/lib/node_modules/ace-core/node_modules/yauzl/index.js:116:25
at /usr/local/lib/node_modules/ace-core/node_modules/yauzl/index.js:473:5
at /usr/local/lib/node_modules/ace-core/node_modules/fd-slicer/index.js:32:7
at FSReqWrap.wrapper [as oncomplete] (fs.js:665:17)
info: Closing logs.

This epub is available here:
http://www.lectura.plus/Block/download/?id=10659

It one among several produced by a French association on old press issues:
http://www.lectura.plus/848-presse-ancienne-accessible.html

These EPUB can be unzipped and also read in Readium...

@rdeltour rdeltour added the bug label Dec 7, 2017
@rdeltour rdeltour self-assigned this Dec 7, 2017
@rdeltour rdeltour added this to the v0.8.0 “Beta 4” milestone Dec 7, 2017
@mattgarrish
Copy link

From what I see, Ace is using yauzl to unzip, and according to this bug report it is stricter in interpreting the central directory record processing: thejoshwolfe/yauzl#48 (comment)

Sounds like a flaw in whatever program was used to pack the epub. I ran winrar's repair option on the archive and it worked fine after for me, so if you have a program with similar functionality you could give that route a try.

@rdeltour
Copy link
Member

rdeltour commented Dec 7, 2017

Thanks @mattgarrish. Yes I came to the same conclusion. The annoying situation is that EpubCheck doesn't catch that, so the file is apparently valid.

I'll try to see if Ace can somehow work around that, or at least provide a more informative error message.

@laudrain
Copy link
Author

laudrain commented Dec 7, 2017

Thanks @mattgarrish and @rdeltour: I'll let this know to the association producing these EPUB.
Meanwhile, if Ace could get around that if would bring the checking on a11y itself...
Should it be reported as a bug of epubcheck?

@rdeltour
Copy link
Member

I had another look at this issue. It seems that yauzl (the unzipping 3d party lib used by Ace) is correct.

I'm tempted to leave it as-is, just adding a the text "(The ZIP file may be corrupt)" to be a bit more informative.

Fixing the ZIP is not trivial, and would first need to extract the right info from the thrown error. Even with that, it may well be that the ZIP is more corrupt than just having an undeclared comment length.

@mattgarrish WDYT?

@laudrain did you hear back from the publisher? I'd be interested to know more about their production process, to see if there's a flaw in their ZIP utility, or if the problem comes from somewhere else.

@laudrain
Copy link
Author

@rdeltour As I got no news, I just sent a reminder to the publisher.
I agree with you to consider this as a corrupted ZIP.

@rdeltour
Copy link
Member

OK, thank you Luc!

@rdeltour
Copy link
Member

I updated the error message in #135. Closing this issue, we can reopen if we get more similar reports and need a better mitigation strategy.

@danielweck
Copy link
Member

Hello! FYI:
In the Readium2 "streamer" we have an abstraction layer sitting on top of 3 separate concrete zip libraries:

  1. node-stream-zip:
    https://www.npmjs.com/package/node-stream-zip
    https://github.com/antelle/node-stream-zip
  2. yauzl:
    https://www.npmjs.com/package/yauzl
    https://github.com/thejoshwolfe/yauzl
  3. unzipper:
    https://www.npmjs.com/package/unzipper
    https://github.com/ZJONSSON/node-unzipper

Option (1) node-stream-zip is used in the general case (publications loaded from the local filesystem), option (2) yauzl is used in the specific case where we fetch files from the network via byte ranges (HTTP 206 partial requests), and option (3) unzipper is not actually used right now :)
We extracted basic performance metrics using process.hrtime() and we concluded that node-stream-zip was "better" in the general case.

Note that we also use yazl to inject files into existing archives, but that's a different use-case altogether:
https://www.npmjs.com/package/yazl
https://github.com/thejoshwolfe/yazl

Cheers, Dan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants