Use platform native encoding by default, not UTF-8 #633

amake · 2016-04-10T08:13:29Z

Per the discussion here, the default encoding used for Java strings being passed to native interfaces should not be hard-coded as UTF-8, but instead should be the platform's native encoding.

This patch takes the native encoding from Charset.defaultCharset() and updates the relevant tests. In addition, I have ensured that the tests pass even when the default charset is Cp1252 (the default on English Windows); special care was required in this case because the tests uses characters that cannot round-trip between Unicode and Cp1252, and it is not clear that choosing round-trippable characters would still result in a meaningful test when Unicode is supported.

Note that I was not able to actually run the tests on Windows as I didn't have time to set up a dev environment; instead I ran them on OS X with Native.DEFAULT_CHARSET temporarily set to Cp1252.

dblock · 2016-04-10T13:33:00Z

I'll let @twall comment/merge, this seems like a significant change. It definitely needs a https://github.com/java-native-access/jna/blob/master/CHANGES.md entry, too, please.

twall · 2016-04-10T14:00:10Z

Why the charset.decode(charset.encode(UNICODE)) instead of simple String concatenation? Is that to ensure that we're accounting for encodings that can't actually handle the string in question?

To that effect, the operation might warrant its own method describing itself. Not critical, though.

Add the bugfix description to CHANGES.md and this looks good.

amake · 2016-04-11T00:55:02Z

I updated CHANGES.md.

Is that to ensure that we're accounting for encodings that can't actually handle the string in question?

Yes, exactly. The test string contains U+0444, which is not contained within Cp1252, so we have to make sure we're comparing post-roundtripped strings. The alternative is to only test lowest-common-denominator characters, which would end up being ASCII so that's pointless.

Another option would be to parameterize these tests on the encoding, i.e. specifically test Cp1252 in addition to UTF-8 and maybe others.

To that effect, the operation might warrant its own method describing itself.

I agree. We do it in a couple different classes; would you prefer a separate method per class, or a "Util" class to keep it DRY?

amake · 2016-04-12T05:34:27Z

Rebased on master to resolve conflicts.

@twall

twall · 2016-04-12T10:43:27Z

Fixing up the tests to accommodate gaps in cp1252 makes sense. I just pulled a unicode string I had handy that worked with utf8. The intent was to use something outside of the ASCII range and yet supported by the encoding.

It may make sense to refactor slightly; we don't actually need to test the encoding/decoding itself, but rather that the platform default is chosen in the appropriate places as a fallback.

amake · 2016-04-12T11:07:38Z

The problem is that Windows uses a different native encoding depending on the OS's language, so we can't just choose something in Cp1252; we'd have to choose something within the intersection of all Windows default encodings that is also not in ASCII. There may not be any such characters (I didn't think it was worth researching), which is why I thought it made the most sense to simply add a round-trip on the test string.

The alternative to this is to test only specific encodings, as I mentioned earlier. But as you point out that's not really what we're interested in testing.

twall · 2016-04-12T11:47:03Z

I'll call this good enough for now. If we think of some better/more appropriate tests, we can put into a future PR.

Seems to be related to this: java-native-access/jna#633

amake added a commit to amake/jna that referenced this pull request Apr 11, 2016

Note fix for java-native-access#633 in CHANGES.md

3a427fe

amake added 4 commits April 12, 2016 14:31

Use platform native encoding, not UTF-8, by default

0399562

Fix testCustomizeDefaultStringEncoding for when default == file.encoding

a846d5c

Ensure all tests pass when Native.DEFAULT_ENCODING is Cp1252

8a9f5b0

Note fix for java-native-access#633 in CHANGES.md

e98d995

amake force-pushed the native-encoding-fix branch from 3a427fe to e98d995 Compare April 12, 2016 05:33

twall merged commit 2201191 into java-native-access:master Apr 12, 2016

amake deleted the native-encoding-fix branch April 12, 2016 11:47

gwenn added a commit to gwenn/sqlite-jna that referenced this pull request Jul 1, 2017

Try to fix Appveyor build

ee36b61

Seems to be related to this: java-native-access/jna#633

amake mentioned this pull request Oct 3, 2017

exception: invalid memory access (hunspell) languagetool-org/languagetool#580

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use platform native encoding by default, not UTF-8 #633

Use platform native encoding by default, not UTF-8 #633

amake commented Apr 10, 2016

dblock commented Apr 10, 2016

twall commented Apr 10, 2016

amake commented Apr 11, 2016

amake commented Apr 12, 2016

twall commented Apr 12, 2016

amake commented Apr 12, 2016

twall commented Apr 12, 2016

Use platform native encoding by default, not UTF-8 #633

Use platform native encoding by default, not UTF-8 #633

Conversation

amake commented Apr 10, 2016

dblock commented Apr 10, 2016

twall commented Apr 10, 2016

amake commented Apr 11, 2016

amake commented Apr 12, 2016

twall commented Apr 12, 2016

amake commented Apr 12, 2016

twall commented Apr 12, 2016