Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use platform native encoding by default, not UTF-8 #633

Merged
merged 4 commits into from
Apr 12, 2016

Conversation

amake
Copy link
Contributor

@amake amake commented Apr 10, 2016

Per the discussion here, the default encoding used for Java strings being passed to native interfaces should not be hard-coded as UTF-8, but instead should be the platform's native encoding.

This patch takes the native encoding from Charset.defaultCharset() and updates the relevant tests. In addition, I have ensured that the tests pass even when the default charset is Cp1252 (the default on English Windows); special care was required in this case because the tests uses characters that cannot round-trip between Unicode and Cp1252, and it is not clear that choosing round-trippable characters would still result in a meaningful test when Unicode is supported.

Note that I was not able to actually run the tests on Windows as I didn't have time to set up a dev environment; instead I ran them on OS X with Native.DEFAULT_CHARSET temporarily set to Cp1252.

@dblock
Copy link
Member

dblock commented Apr 10, 2016

I'll let @twall comment/merge, this seems like a significant change. It definitely needs a https://github.com/java-native-access/jna/blob/master/CHANGES.md entry, too, please.

@twall
Copy link
Contributor

twall commented Apr 10, 2016

Why the charset.decode(charset.encode(UNICODE)) instead of simple String concatenation? Is that to ensure that we're accounting for encodings that can't actually handle the string in question?

To that effect, the operation might warrant its own method describing itself. Not critical, though.

Add the bugfix description to CHANGES.md and this looks good.

amake added a commit to amake/jna that referenced this pull request Apr 11, 2016
@amake
Copy link
Contributor Author

amake commented Apr 11, 2016

I updated CHANGES.md.

Is that to ensure that we're accounting for encodings that can't actually handle the string in question?

Yes, exactly. The test string contains U+0444, which is not contained within Cp1252, so we have to make sure we're comparing post-roundtripped strings. The alternative is to only test lowest-common-denominator characters, which would end up being ASCII so that's pointless.

Another option would be to parameterize these tests on the encoding, i.e. specifically test Cp1252 in addition to UTF-8 and maybe others.

To that effect, the operation might warrant its own method describing itself.

I agree. We do it in a couple different classes; would you prefer a separate method per class, or a "Util" class to keep it DRY?

@amake amake force-pushed the native-encoding-fix branch from 3a427fe to e98d995 Compare April 12, 2016 05:33
@amake
Copy link
Contributor Author

amake commented Apr 12, 2016

Rebased on master to resolve conflicts.

@twall

@twall
Copy link
Contributor

twall commented Apr 12, 2016

Fixing up the tests to accommodate gaps in cp1252 makes sense. I just pulled a unicode string I had handy that worked with utf8. The intent was to use something outside of the ASCII range and yet supported by the encoding.

It may make sense to refactor slightly; we don't actually need to test the encoding/decoding itself, but rather that the platform default is chosen in the appropriate places as a fallback.

@amake
Copy link
Contributor Author

amake commented Apr 12, 2016

The problem is that Windows uses a different native encoding depending on the OS's language, so we can't just choose something in Cp1252; we'd have to choose something within the intersection of all Windows default encodings that is also not in ASCII. There may not be any such characters (I didn't think it was worth researching), which is why I thought it made the most sense to simply add a round-trip on the test string.

The alternative to this is to test only specific encodings, as I mentioned earlier. But as you point out that's not really what we're interested in testing.

@twall twall merged commit 2201191 into java-native-access:master Apr 12, 2016
@twall
Copy link
Contributor

twall commented Apr 12, 2016

I'll call this good enough for now. If we think of some better/more appropriate tests, we can put into a future PR.

@amake amake deleted the native-encoding-fix branch April 12, 2016 11:47
gwenn added a commit to gwenn/sqlite-jna that referenced this pull request Jul 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants