-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to emit raw string buffers instead of decoded strings #42
Comments
Here's a reduced testcase:
Testcase code: 'use strict'
const fromFd = require('yauzl').fromFd
const once = require('once')
const fs = require('fs')
// ± unzip -l test.zip
// Archive: test.zip // Length Date Time Name
// --------- ---------- ----- ----
// 0 10-03-2016 15:16 ç/
// 6 10-03-2016 15:16 ç/hello
// --------- -------
// 6 2 files
//
function issue42 (fd, cb) {
cb = once(cb)
fromFd(fd, (err, zip) => {
if (err) return cb(err)
zip.on('entry', function onentry (entry) {
if ((/ç\/hello/).test(entry.fileName)) {
console.log(entry.fileName)
cb(entry.fileName)
}
console.log(entry.fileName)
})
zip.on('end', () => {
if (!cb.called) {
cb(new Error('not found'))
}
})
})
}
const fd = fs.openSync(__dirname + '/test.zip', 'r')
issue42(fd, (err, res) => {
console.log(err, res)
}) test zip: |
Interesting bug report. The behavior you're seeing from Info-Zip is actually non-standard behavior. yauzl is behaving "correctly" with respect to the zipfile specification. There are multiple ways for a zipfile to indicate that the filenames are encoded in utf-8, and your zipfile does none of them. According to the spec, if no charset is specified, then cp437 is to be used, which is what yauzl is doing. I'm not sure why Info-Zip's
So the question remains, what should yauzl do in this situation? Should the spec be considered correct, or should "in practice" behavior of popular tools be considered correct? It's a tough call, but I'm leaning toward the spec. If you'd like to fix your zipfile, try setting general purpose bit 11 in all the entries. That is what yazl does to indicate the filename is to be decoded using utf8. If you're creating the zipfile at a higher level than that, then i suggest using a different library/utility for creating zipfiles, because the one you're using is non-conformant. If you didn't make the zipfile at all, but you got it from a user, then i suggest you forward this paragraph to your user. I haven't seen general purpose bit 11 mishandled like this in any existing zipfile utility i've tested this with. I can't say for sure, but i believe i've tested this issue with Info-Zip's So I don't know how this zipfile came to exist with the filename encoding messed up, but I really don't think I should follow in Info-Zip's nonstandard footsteps on this matter. Following the spec is one of yauzl's design principles, and cp437 support is a feature. |
FWIW the zip was created with |
@thejoshwolfe would you consider using something like https://gist.github.com/dweinstein/3125bed0a478e2b0acfccfae91c90fd5#file-guess-encoding-js which is a port to javascript of libzip's I have a branch you can try out here https://github.com/dweinstein/yauzl/tree/guess-encoding -- all tests are passing for me at the moment. |
I would consider adding an option to the Realistically, I'd bet that it's safe to always pass in that flag, if it existed. The only time it would cause a problem is if a zipfile was created with cp437 and actually used the non-ascii part of cp437. It could happen, but it'd probably be a very old zipfile if it ever did. Does that sound like a viable solution to your problem? |
Better proposal: add an option to |
that sounds pretty reasonable. Having the buffer along with the flags surfaced will definitely allow another library to do the guessing... |
published in version 2.7.0 |
I'm using version 2.6.0 FWIW, node 6.
as you can see the execname is right but the entry.fileName is not right utf-8 AFAICT.
The text was updated successfully, but these errors were encountered: