Unicode decode error for Non-English characters #430

csoni111 · 2017-06-21T19:33:16Z

I am emitting some non English characters (in Hindi language) in json format from my node.js socket.io server to an android client. It all works fine in case of websocket connection but in case of polling it changes the Non-English characters to some garbage values.
On diving deep in the code I found this happens because in both of them it calls decodePacket() function. Now in case of websocket it passes the value of boolean utf8decode as False where as in case of polling it passes the value as True which ultimately calls UTF8.decode(data).

Now in UTF8.Java it first makes a new array of charPoints for all the characters in the message string.
Now for each charPoint in the array it evaluates decodeSymbol(), which returns the converted codePoint value. Now in my case the code point values for non english character is greater than 255 (>2000 actually), so when this gets passed through the function, it should process it till the third byte (I am not sure what exactly this algo is doing). But this ends at #L100 returning only the value of byte1.
This makes my characters change to garbage value.

If someone can help me understand why this utf8 decode then encode is exactly required or at least what this is doing?

The text was updated successfully, but these errors were encountered:

csoni111 · 2017-06-21T19:49:38Z

A sample example of the conversion that is being done:
Lets say my character is अ, its codePoint is 2309.
Now on processing it through decodeSymbol(), on line #96:
2309 & 0xFF returns 5
then on Line#99:
5 & 0x80 returns 0
Hence the if condition becomes true and function ends here, returning 5 as the value. Hence corrupting the
original character.

csoni111 · 2017-06-22T18:48:21Z

I think it is something related to socketio/engine.io-parser#81 and socketio/engine.io#315.

nkzawa · 2017-07-14T06:26:13Z

I wonder if you were using socket.io 2.0.x. In that case, upgrade socket.io-client-java to v1.0.0.

nkzawa · 2017-07-14T06:36:07Z

It seems that's the case.

chanchurbansal · 2017-07-14T07:20:27Z

thanks for the fix. now the response is decoding to utf-16 unicoding

PriyaSingh2311 · 2017-07-27T08:07:32Z

Hi..I am using 1.7.4 at server side and 0.8.3 version at client side in android but unable to connect with socket..Don't know why?

csoni111 mentioned this issue Jun 22, 2017

Is socket.io 2.0 compatible with java client? (Android) socketio/socket.io#2955

Closed

nkzawa closed this as completed Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode decode error for Non-English characters #430

Unicode decode error for Non-English characters #430

csoni111 commented Jun 21, 2017 •

edited

Loading

csoni111 commented Jun 21, 2017

csoni111 commented Jun 22, 2017 •

edited

Loading

nkzawa commented Jul 14, 2017

nkzawa commented Jul 14, 2017

chanchurbansal commented Jul 14, 2017

PriyaSingh2311 commented Jul 27, 2017

Unicode decode error for Non-English characters #430

Unicode decode error for Non-English characters #430

Comments

csoni111 commented Jun 21, 2017 • edited Loading

csoni111 commented Jun 21, 2017

csoni111 commented Jun 22, 2017 • edited Loading

nkzawa commented Jul 14, 2017

nkzawa commented Jul 14, 2017

chanchurbansal commented Jul 14, 2017

PriyaSingh2311 commented Jul 27, 2017

csoni111 commented Jun 21, 2017 •

edited

Loading

csoni111 commented Jun 22, 2017 •

edited

Loading