-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use platform native encoding by default, not UTF-8 #633
Conversation
I'll let @twall comment/merge, this seems like a significant change. It definitely needs a https://github.com/java-native-access/jna/blob/master/CHANGES.md entry, too, please. |
Why the To that effect, the operation might warrant its own method describing itself. Not critical, though. Add the bugfix description to CHANGES.md and this looks good. |
I updated
Yes, exactly. The test string contains Another option would be to parameterize these tests on the encoding, i.e. specifically test Cp1252 in addition to UTF-8 and maybe others.
I agree. We do it in a couple different classes; would you prefer a separate method per class, or a "Util" class to keep it DRY? |
3a427fe
to
e98d995
Compare
Rebased on master to resolve conflicts. |
Fixing up the tests to accommodate gaps in cp1252 makes sense. I just pulled a unicode string I had handy that worked with utf8. The intent was to use something outside of the ASCII range and yet supported by the encoding. It may make sense to refactor slightly; we don't actually need to test the encoding/decoding itself, but rather that the platform default is chosen in the appropriate places as a fallback. |
The problem is that Windows uses a different native encoding depending on the OS's language, so we can't just choose something in Cp1252; we'd have to choose something within the intersection of all Windows default encodings that is also not in ASCII. There may not be any such characters (I didn't think it was worth researching), which is why I thought it made the most sense to simply add a round-trip on the test string. The alternative to this is to test only specific encodings, as I mentioned earlier. But as you point out that's not really what we're interested in testing. |
I'll call this good enough for now. If we think of some better/more appropriate tests, we can put into a future PR. |
Seems to be related to this: java-native-access/jna#633
Per the discussion here, the default encoding used for Java strings being passed to native interfaces should not be hard-coded as UTF-8, but instead should be the platform's native encoding.
This patch takes the native encoding from
Charset.defaultCharset()
and updates the relevant tests. In addition, I have ensured that the tests pass even when the default charset is Cp1252 (the default on English Windows); special care was required in this case because the tests uses characters that cannot round-trip between Unicode and Cp1252, and it is not clear that choosing round-trippable characters would still result in a meaningful test when Unicode is supported.Note that I was not able to actually run the tests on Windows as I didn't have time to set up a dev environment; instead I ran them on OS X with
Native.DEFAULT_CHARSET
temporarily set toCp1252
.