-
Notifications
You must be signed in to change notification settings - Fork 570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double encoding/decoding UTF8 #315
Comments
Some confusing use of incorrect terminology here. I’ll try to clear that up:
|
Thanks for clarifying that.
|
…s fixed properly
I stumbled upon this issue in the reverse direction on my Python engine.io server (miguelgrinberg/Flask-SocketIO#246). The official JS engine.io client also sends strings with a double utf-8 encode to the server. My solution is to check if a double utf-8 decode can be applied. If that succeeds, then I assume the packet must have been double encoded. If the second conversion fails due to invalid chars, then I assume the packet must have been single encoded. |
@miguelgrinberg @calzoneman what do you suggest to fix that issue? should we remove utf8.encode/decode calls in |
@darrachequesne it's been a while since I looked at this. I don't remember the full details, but keep in mind that the websocket transport does not have this problem, I think the best place to address this is in the long-polling code, not on the parser. |
What's weird is that, in engine.io-parser, both exports.encodePacket = function (packet, supportsBinary, utf8encode, callback) {
...
if (undefined !== packet.data) {
encoded += utf8encode ? utf8.encode(String(packet.data)) : String(packet.data);
}
...
}
exports.encodePayload = function (packets, supportsBinary, callback) {
...
exports.encodePacket(packet, supportsBinary, true, function(message) {
doneCallback(null, setLengthHeader(message));
});
...
};
|
It's been a while since I dug into this issue, but I'll try to take some time this week to look at it again. I agree with Miguel; I think the problem is likely in the polling transport since the websocket transport does not appear to have this issue. |
I think the difference between both transports is that polling uses WebSocket.prototype.send = function (packets) {
packets.forEach(function (packet) {
parser.encodePacket(packet, self.supportsBinary, function (data) {
... // utf8encode = undefined
};
// when
Polling.prototype.send = function (packets) {
...
parser.encodePayload(packets, this.supportsBinary, function (data) {
... // encodePayload then calls encodePacket with utf8encode = true
}; And here: exports.encodePacket = function (packet, supportsBinary, utf8encode, callback) {
...
// data fragment is optional
if (undefined !== packet.data) {
encoded += utf8encode ? utf8.encode(String(packet.data)) : String(packet.data);
}
..
}; The question being, is |
It appears that engine.io is double-decoding (and double-encoding) UTF8 strings for polling clients. In particular, if polling clients specify any content-type besides
'application/octet-stream'
, engine.io callsreq.setEncoding('utf8')
, the request data is decoded as UTF8, and the resulting string is passed on to engine.io-parser, which then attempts to callutf8.decode()
. Since the string has already been decoded from UTF8, this fails*. This is what I've observed from tinkering with the server side, and from talking to @nuclearace about his issues I suspect messages being sent to polling clients are double-encoded too.The following demonstrates the issue (note that I added a console.log(e) in the exception handler where utf8.decode() is called):
Interestingly, this alternative version works:
So the problem to me seems to be that the parser is expecting a raw string, but since the string is coming from node's HTTP server, it has already been decoded from UTF8, and this should only happen once.
*I assume the client is double encoding, otherwise polling would be completely broken. This issue affects 3rd-party clients that are single-encoding UTF8 strings.
EDIT: Cleaned up some terminology.
The text was updated successfully, but these errors were encountered: