-
Notifications
You must be signed in to change notification settings - Fork 977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode decode error for Non-English characters #430
Comments
A sample example of the conversion that is being done: |
I think it is something related to socketio/engine.io-parser#81 and socketio/engine.io#315. |
I wonder if you were using socket.io 2.0.x. In that case, upgrade socket.io-client-java to v1.0.0. |
It seems that's the case. |
thanks for the fix. now the response is decoding to utf-16 unicoding |
Hi..I am using 1.7.4 at server side and 0.8.3 version at client side in android but unable to connect with socket..Don't know why? |
I am emitting some non English characters (in Hindi language) in json format from my node.js socket.io server to an android client. It all works fine in case of
websocket
connection but in case ofpolling
it changes the Non-English characters to some garbage values.On diving deep in the code I found this happens because in both of them it calls decodePacket() function. Now in case of
websocket
it passes the value ofboolean utf8decode
as False where as in case ofpolling
it passes the value as True which ultimately calls UTF8.decode(data).Now in UTF8.Java it first makes a new array of charPoints for all the characters in the message string.
Now for each charPoint in the array it evaluates decodeSymbol(), which returns the converted codePoint value. Now in my case the code point values for non english character is greater than 255 (>2000 actually), so when this gets passed through the function, it should process it till the third byte (I am not sure what exactly this algo is doing). But this ends at #L100 returning only the value of byte1.
This makes my characters change to garbage value.
If someone can help me understand why this utf8 decode then encode is exactly required or at least what this is doing?
The text was updated successfully, but these errors were encountered: