-
-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode 16 first cut security data #777
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I could understand looks fine. I'm not familiar with the invariants code or some of the other internal data files here.
I assume this will be modified further as a result of the conversation about the default IDNA type/status |
@dwanders-A & SAH/SEW: should any of the following not be Uncommon_Use? I am sending more details via email.
|
Debbie replied: The casepair for Cyrillic TJE is in modern use, as are the three Latin lambda characters. LATIN CAPITAL LETTER RAMS HORN is apparently in modern use and the casepair for LATIN LETTER S WITH DIAGONAL STROKE is also in modern use. The Myanmar digits are also in modern use, based on the proposal. Are the characters above in customary widespread use? Hmmm, well, once they are in fonts, they will / may be. The following are found in manuscripts, apparently, so they are not modern: |
The question should have been: Are they in common/customary modern use? And whenever we are in doubt, the default should be Uncommon_Use. If it is in modern use, but very infrequently, such as in technical documents, then we need to know that too. For example, just from the https://en.wikipedia.org/wiki/Latin_gamma we have that
What that indicates is that it should be marked as either Uncommon_Use or Technical. It is important to note whenever there is some doubt, we should default it to Uncommon_Use unless we have reasonable evidence that it is in common use. This data does not affect the use of the character for normal purposes — writing books, articles, text messages, and so on; it is specially designed for identifiers and similar constructs. We can always set "upgrade" it later on whenever someone presents a reasonable case for it being in common/customary use. For example, if someone finds that the uppercase form is used in the normal orthography for a modern language X, which has significant, active population using it, and makes a proposal to that effect, that would be justification for dropping the Identifier_Type of Uncommon_Use or Technical. |
@macchiati @asmusf et al. -- no new Recommended characters; could you (or someone) please approve this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as a first cut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine
Best reviewed one commit at a time.