Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode decode error for Non-English characters #430

Closed
csoni111 opened this issue Jun 21, 2017 · 6 comments
Closed

Unicode decode error for Non-English characters #430

csoni111 opened this issue Jun 21, 2017 · 6 comments

Comments

@csoni111
Copy link

csoni111 commented Jun 21, 2017

I am emitting some non English characters (in Hindi language) in json format from my node.js socket.io server to an android client. It all works fine in case of websocket connection but in case of polling it changes the Non-English characters to some garbage values.
On diving deep in the code I found this happens because in both of them it calls decodePacket() function. Now in case of websocket it passes the value of boolean utf8decode as False where as in case of polling it passes the value as True which ultimately calls UTF8.decode(data).

Now in UTF8.Java it first makes a new array of charPoints for all the characters in the message string.
Now for each charPoint in the array it evaluates decodeSymbol(), which returns the converted codePoint value. Now in my case the code point values for non english character is greater than 255 (>2000 actually), so when this gets passed through the function, it should process it till the third byte (I am not sure what exactly this algo is doing). But this ends at #L100 returning only the value of byte1.
This makes my characters change to garbage value.

If someone can help me understand why this utf8 decode then encode is exactly required or at least what this is doing?

@csoni111
Copy link
Author

A sample example of the conversion that is being done:
Lets say my character is , its codePoint is 2309.
Now on processing it through decodeSymbol(), on line #96:
2309 & 0xFF returns 5
then on Line#99:
5 & 0x80 returns 0
Hence the if condition becomes true and function ends here, returning 5 as the value. Hence corrupting the
original character.

@csoni111
Copy link
Author

csoni111 commented Jun 22, 2017

I think it is something related to socketio/engine.io-parser#81 and socketio/engine.io#315.

@nkzawa
Copy link
Contributor

nkzawa commented Jul 14, 2017

I wonder if you were using socket.io 2.0.x. In that case, upgrade socket.io-client-java to v1.0.0.

@nkzawa
Copy link
Contributor

nkzawa commented Jul 14, 2017

It seems that's the case.

@nkzawa nkzawa closed this as completed Jul 14, 2017
@chanchurbansal
Copy link

thanks for the fix. now the response is decoding to utf-16 unicoding

@PriyaSingh2311
Copy link

Hi..I am using 1.7.4 at server side and 0.8.3 version at client side in android but unable to connect with socket..Don't know why?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants