Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an invariant relating Vowel_Dependent to Alphabetic #570

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

eggrobin
Copy link
Member

@eggrobin eggrobin commented Oct 10, 2023

There are a handful of exceptions*, but most of them are alphabetic. See TUS on Alphabetic:

Alphabetic. The Alphabetic property is a derived informative property of the primary units of alphabets and/or syllabaries, whether combining or noncombining. Included in this group would be composite characters that are canonical equivalents to a combining character sequence of an alphabetic base character plus one or more combining characters; letter digraphs; contextual variants of alphabetic characters; ligatures of alphabetic characters; contextual variants of ligatures; modifier letters; letterlike symbols that are compatibility equivalents of single alphabetic letters; and miscellaneous letter elements. Notably, U+00AA feminine ordinal indicator and U+00BA masculine ordinal indicator are simply abbreviatory forms involving a Latin letter and should be considered alphabetic rather than nonalphabetic symbols.

(We flag Mc that are not Alphabetic in the invariants, but obviously we cannot flag Mn. This should help with some of those.)

* The exceptions are [\N{ORIYA SIGN OVERLINE}\N{THAI CHARACTER MAITAIKHU}\N{LIMBU SIGN KEMPHRENG}\N{SHARADA VOWEL MODIFIER MARK}\N{SHARADA EXTRA SHORT VOWEL MARK}] which are all diacritics.

[\p{InSc=Bindu} - $nonAlphabeticBindus - \p{Alphabetic}] = []
[\p{InSc=Bindu} - \p{Alphabetic}] = $nonAlphabeticBindus

Let $nonAlphabeticDependentVowels = [\N{ORIYA SIGN OVERLINE}\N{THAI CHARACTER MAITAIKHU}\N{LIMBU SIGN KEMPHRENG}\N{SHARADA VOWEL MODIFIER MARK}\N{SHARADA EXTRA SHORT VOWEL MARK}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this set come from? Did you just try to assert that dependent vowels should be Alphabetic, and you found that these ones are not?
Please say something about this set in the PR description.

FYI @Ken-Whistler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this set come from? Did you just try to assert that dependent vowels should be Alphabetic, and you found that these ones are not?

Yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a footnote in the PR description.

I hesitated to add an invariant that they are all diacritics. Maybe I should.

@eggrobin eggrobin merged commit db60be5 into unicode-org:main Oct 11, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants