-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFStringRef#stringValue buffer needs space for 4 UTF8 bytes #1345
Conversation
So Apple manages to screw up to calculate the correct byte width for the character string and so every library out there has to reinvent the wheel? Great ... But yes, the change looks fine. |
Apple is not alone in this. |
OK, so maybe it WAS okay and my test case when I first tried this may have omitted the null byte. I reported this as a bug to Apple and received this response:
I tried to reproduce my earlier error and failed. I may have forgotten to add the null byte in my original failing test case. It seems that there are some 3-byte UTF-8 characters that are only 2 bytes in UTF-16 (such as the alaf character in the earlier test case) but in the case of 4-byte UTF-8, they always consume 4 bytes in UTF-16 and result in 2 code pairs, thus the "length" determined even for a single "character" will be sufficient. Oddly, with a single 4-byte character (as displayed) the Java So I will likely be rolling back this PR to the previous version, but I'm going to take some time to try to fully understand Unicode and Java strings before I do anything else. Sorry for the churn. |
Ok - my take on this:
I don't know what you reported, but considering the original code: jna/contrib/platform/src/com/sun/jna/platform/mac/CoreFoundation.java Lines 491 to 494 in 5ac6161
jna/contrib/platform/src/com/sun/jna/platform/mac/CoreFoundation.java Lines 495 to 499 in 5ac6161
There are two interpretations possible here:
So my take on this, the estimation for a string done by this method should overestimate the required memory. |
In addition to noting UTF-8 can be 4 bytes, I did provide the sequence of function calls used, referencing the sequence
The way I interpret the reply,
Right. This is a distraction from the root question here, but it is relied upon in the JNA implementation and tests so is relevant. The native method Based on my testing, it always appears to create a
It does not hurt to overestimate, which multiplying by 4 does. My only concern is whether the tests for string length and/or character array length should be changed. |
Rereading this, I think I muddled the issue. Here's my thoughts more succinctly:
|
OK I've convinced myself that Java always adds a |
Fixes #1342 (again)
macOS incorrectly uses a 3 byte per character conversion when determining the maximum byte size of a UTF8 encoded string. Changing the buffer size calculation to use 4 bytes (plus a null byte) actually produces an appropriately sized buffer.