mimetype file content not properly extracted sometimes #2

rdeltour · 2013-10-15T00:09:05Z

From [email protected] on December 28, 2007 13:35:44

(using epubcheck-0.9.2.jar)

epub file: http://www.hxa7241.org/articles/content/EpubGuide- hxa7241.epub (including correct mimetype file)
was zipped with: http://www.info-zip.org/Zip.html but produced error:

EpubGuide-hxa7241.epub: mimetype contains wrong type (application/
epub+zip expected)

The problem: In a zip file, there is an extra field between the file name
and the file content (it is zero-length only optionally).

To fix:

Considering the zip file as a byte array, then:

filename starts at [30]
content starts at [30 + filenameLength + extrafieldLength]

filenameLength is ([27] << 8) | [26]
extrafieldLength is ([29] << 8) | [28]
contentLength is ([21] << 24) | ([20] << 16) | ([19] << 8) | [18]

So, summarising comments for possible code would be (separating
deserializing from checking):

// read checkable values (filename, filecontent)
// open file stream

  // read some header values
     // read file content length
        // seek to 18, read 4 (low bytes first)
     // read filename length
        // seek +4, read 2 (low bytes first)
     // read extrafield length
        // read 2 (low bytes first)

  // read checkable values
     // read filename
        // read filenameLength bytes
     // read filecontent (maybe not all)
        // seek +extrafieldLength, read filecontentLength bytes

// check values
// check filename
// make string from filename bytes
// compare with "mimetype"
// check filecontent
// make string from filecontent bytes
// compare with "application/epub+zip"

But, probably it would be better to use java.util.zip.ZipFile...

Original issue: http://code.google.com/p/epubcheck/issues/detail?id=2

The text was updated successfully, but these errors were encountered:

rdeltour · 2013-10-15T00:09:06Z

From [email protected] on December 31, 2007 13:07:45

Not a bug. OCF spec section 4 "ZIP Container" states that actual MIME type (i.e.,
the string "application/XXXX+zip") must begin at position 38. This file does not
satisfy this requirement.

$ od -c EpubGuide-hxa7241.zip | head -5
0000000 P K 003 004 \n \0 \0 \0 \0 \0 \0 ` 234 7 o a
0000020 253 , 024 \0 \0 \0 024 \0 \0 \0 \b \0 021 \0 m i
0000040 m e t y p e U T \r \0 \a @ 345 t G 200
0000060 < t G 0 O q G a p p l i c a t i
0000100 o n / e p u b + z i p P K 003 004 024

Status: Invalid

rdeltour · 2013-10-15T01:22:02Z

From [email protected] on December 31, 2007 13:07:45

Not a bug. OCF spec section 4 "ZIP Container" states that actual MIME type (i.e.,
the string "application/XXXX+zip") must begin at position 38. This file does not
satisfy this requirement.

$ od -c EpubGuide-hxa7241.zip | head -5
0000000 P K 003 004 \n \0 \0 \0 \0 \0 \0 ` 234 7 o a
0000020 253 , 024 \0 \0 \0 024 \0 \0 \0 \b \0 021 \0 m i
0000040 m e t y p e U T \r \0 \a @ 345 t G 200
0000060 < t G 0 O q G a p p l i c a t i
0000100 o n / e p u b + z i p P K 003 004 024

Status: Invalid

rdeltour closed this as completed Oct 15, 2013

rdeltour mentioned this issue Oct 15, 2013

smashwords says this epub doesn't pass, but the validate page says it does. I'm flummoxed. #122

Closed

iherman mentioned this issue Oct 24, 2020

SVG DTD not accepted #1114

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mimetype file content not properly extracted sometimes #2

mimetype file content not properly extracted sometimes #2

rdeltour commented Oct 15, 2013

rdeltour commented Oct 15, 2013

rdeltour commented Oct 15, 2013

mimetype file content not properly extracted sometimes #2

mimetype file content not properly extracted sometimes #2

Comments

rdeltour commented Oct 15, 2013

rdeltour commented Oct 15, 2013

rdeltour commented Oct 15, 2013