You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using Jackson's non-blocking parser to implement a BodySubscriber for use with Java's non-blocking HTTP client. The parser is created by JsonFactory#createNonBlockingByteArrayParser() using the factory instance associated with the ObjectMapper . It's working like a charm, but it seems that it uses UTF-8 by default and there is no way of telling it other encodings to use (such as the encoding specified by the response headers other than UTF-8).
I figured it might auto-detect the response body's encoding like it's the case with other parsers, but it turned out that it assumes all input is UTF-8. For example, this snippet would crash:
Yes, you can not define other encodings so it only works for UTF-8 and 7-bit ASCII (since that is a subset).
This is a fundamental limitation and it is unlikely implementations for other encodings would be added.
If support was to be added it would likely require version that handles byte-to-character encoding separate from tokenization, and that would be full rewrite.
So: non-blocking parser will only work on UTF-8 input. I should probably mention this better in Javadocs.
I think in my case then I should use the non-blocking parser only if the response charset is UTF-8 or a subset of it, else fallback to loading the response as a string and deserialize from there. I agree that the Javadocs should mention this to clear up confusion.
Right. Vast majority of JSON really should be UTF-8, especially considering that only officially standard legal charsets are UTF-8, UTF-16 and UTF-32 (as per original JSON specification). But there are so many broken systems that emit other encodings (UTF-8859-x) that.... it is frustrating. Considering that JSON document itself has no mechanism for declaring encoding -- unlike XML which has this capability! -- so documents are not stand-alone any more.
But if sticking to standard supported encodings, auto-detection does work (UTF-16 and UTF-32 can be auto-detected, distinct from UTF-8; Latin-1 and others can not).
I'm using Jackson's non-blocking parser to implement a
BodySubscriber
for use with Java's non-blocking HTTP client. The parser is created byJsonFactory#createNonBlockingByteArrayParser()
using the factory instance associated with theObjectMapper
. It's working like a charm, but it seems that it usesUTF-8
by default and there is no way of telling it other encodings to use (such as the encoding specified by the response headers other thanUTF-8
).I figured it might auto-detect the response body's encoding like it's the case with other parsers, but it turned out that it assumes all input is
UTF-8
. For example, this snippet would crash:It works fine if the JSON string is encoded with
UTF-8
.The text was updated successfully, but these errors were encountered: