Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimetype file content not properly extracted sometimes #2

Closed
rdeltour opened this issue Oct 15, 2013 · 2 comments
Closed

mimetype file content not properly extracted sometimes #2

rdeltour opened this issue Oct 15, 2013 · 2 comments
Labels
type: bug The issue describes a bug type: not an issue The issue is rejected (not an actual issue or not relevant)

Comments

@rdeltour
Copy link
Member

From [email protected] on December 28, 2007 13:35:44

(using epubcheck-0.9.2.jar)

epub file: http://www.hxa7241.org/articles/content/EpubGuide- hxa7241.epub (including correct mimetype file)
was zipped with: http://www.info-zip.org/Zip.html but produced error:

EpubGuide-hxa7241.epub: mimetype contains wrong type (application/
epub+zip expected)

The problem: In a zip file, there is an extra field between the file name
and the file content (it is zero-length only optionally).

To fix:

Considering the zip file as a byte array, then:

filename starts at [30]
content starts at [30 + filenameLength + extrafieldLength]

filenameLength is ([27] << 8) | [26]
extrafieldLength is ([29] << 8) | [28]
contentLength is ([21] << 24) | ([20] << 16) | ([19] << 8) | [18]

So, summarising comments for possible code would be (separating
deserializing from checking):

// read checkable values (filename, filecontent)
// open file stream

  // read some header values
     // read file content length
        // seek to 18, read 4 (low bytes first)
     // read filename length
        // seek +4, read 2 (low bytes first)
     // read extrafield length
        // read 2 (low bytes first)

  // read checkable values
     // read filename
        // read filenameLength bytes
     // read filecontent (maybe not all)
        // seek +extrafieldLength, read filecontentLength bytes

// check values
// check filename
// make string from filename bytes
// compare with "mimetype"
// check filecontent
// make string from filecontent bytes
// compare with "application/epub+zip"

But, probably it would be better to use java.util.zip.ZipFile...

Original issue: http://code.google.com/p/epubcheck/issues/detail?id=2

@rdeltour
Copy link
Member Author

From [email protected] on December 31, 2007 13:07:45

Not a bug. OCF spec section 4 "ZIP Container" states that actual MIME type (i.e.,
the string "application/XXXX+zip") must begin at position 38. This file does not
satisfy this requirement.

$ od -c EpubGuide-hxa7241.zip | head -5
0000000 P K 003 004 \n \0 \0 \0 \0 \0 \0 ` 234 7 o a
0000020 253 , 024 \0 \0 \0 024 \0 \0 \0 \b \0 021 \0 m i
0000040 m e t y p e U T \r \0 \a @ 345 t G 200
0000060 < t G 0 O q G a p p l i c a t i
0000100 o n / e p u b + z i p P K 003 004 024

Status: Invalid

@rdeltour
Copy link
Member Author

From [email protected] on December 31, 2007 13:07:45

Not a bug. OCF spec section 4 "ZIP Container" states that actual MIME type (i.e.,
the string "application/XXXX+zip") must begin at position 38. This file does not
satisfy this requirement.

$ od -c EpubGuide-hxa7241.zip | head -5
0000000 P K 003 004 \n \0 \0 \0 \0 \0 \0 ` 234 7 o a
0000020 253 , 024 \0 \0 \0 024 \0 \0 \0 \b \0 021 \0 m i
0000040 m e t y p e U T \r \0 \a @ 345 t G 200
0000060 < t G 0 O q G a p p l i c a t i
0000100 o n / e p u b + z i p P K 003 004 024

Status: Invalid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug The issue describes a bug type: not an issue The issue is rejected (not an actual issue or not relevant)
Projects
None yet
Development

No branches or pull requests

1 participant