-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can we coalesce quotation mark CE lists into single CEs? #927
Comments
Looks reasonable. From what you wrote here, it looks like there aren't any characters in the second case between FF07 and 2018. Is that still true with your change? |
This is the case according to the allkeys_CLDR.txt file which is in sorted order. |
I thought it was sorted by shifted values ... not a real sort. Although in
this instance maybe that doesn't matter.
…On Mon, Aug 26, 2024, 16:39 Markus Scherer ***@***.***> wrote:
This is the case according to the allkeys_CLDR.txt file which is in sorted
order.
I have to remind myself what the code looks like that I thought would do
this, and see what's different from a case like sharp s.
Anyway, this is just a drive-by thought that I wanted to jot down. The
real work for today is #926
<#926> :-)
—
Reply to this email directly, view it on GitHub
<#927 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMB7PHFUCQUEGBBCRHLZTO4BRAVCNFSM6AAAAABNE5O35GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRGI4TGMJSG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The real UCA allkeys.txt is sorted with something like alternate=shifted (not sure if that's completely true, and I think it might sort with strength=tertiary, dropping the shifted primaries, making ignorable characters come out in a somewhat random order). The allkeys_CLDR.txt and allkeys_DUCET.txt that the Unicode Tools generate are sorted with alternate=non-ignorable. FYI: I found the coalescing code, and I amended the issue description above a few minutes ago. |
I remember that I added some logic for the CLDR version of the default sort order to coalesce some adjacent CEs, absorbing ignorable CEs into their main CEs. Look for how that works for things like sharp s, or generally look for existing differences between allkeys_CLDR.txt and allkeys_DUCET.txt, and see whether we can turn this
into something like this
I found the coalescing code, and I had misremembered where I put it. It's in MappingsForFractionalUCA.java modifyMappings()
// Check and merge secondary CEs.
It does not modify the "UCA" mappings. It only modifies intermediate mappings that turn into FractionalUCA.txt mappings. I verified that allkeys_CLDR.txt and allkeys_DUCET.txt have the same number of non-initial ignorable CEs. And FractionalUCA.txt shows the merged byte-based CEs:
The code includes comments about the modified mappings not being well-formed. It should be possible to make them well-formed, since the resulting FractionalUCA mappings are well-formed.
If we wanted to, we could then try to move this logic up one or two levels:
Either way, the FractionalUCA generator would need to be adjusted for working with non-ignorable CEs having non-default secondary weights.
The text was updated successfully, but these errors were encountered: