Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Letter-spacing splits conjuncts #117

Open
r12a opened this issue Mar 31, 2021 · 2 comments
Open

Letter-spacing splits conjuncts #117

r12a opened this issue Mar 31, 2021 · 2 comments
Labels
doc:beng doc:deva doc:gujr doc:taml gap i:spacing Text spacing l:bn Bengali language & script l:gu Gujurati language & script l:hi Hindi, Devanagari script l:ta Tamil language & script p:advanced s:beng Bengali script s:deva Devanagari script s:gujr Gurajati script s:taml Tamil script x:beng x:blink x:deva x:gecko x:gujr x:taml x:webkit

Comments

@r12a
Copy link
Contributor

r12a commented Mar 31, 2021

This issue is applicable to most languages that form conjuncts from consonant clusters using an invisible virama.

A consonant cluster that uses a conjunct (rather than visible virama) should not be split when letter-spacing is applied.

The GAP

Relying on grapheme clusters as the main segmentation approach fails for many Indic scripts because conjuncts are composed of multiple grapheme clusters, and should be kept together as a unit.

For these situations it is necessary to tailor the segmentation algorithm, so that it recognises the whole consonant cluster plus any attached vowel-signs or combining characters as a single unit.

For examples see Typographic character units in complex scripts.

See also notes on segmentation for the following orthographies relevant to this project: Hindi, Bangla, Punjabi, Tamil.

css-text-3 CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that the cases just described go beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support. The spec doesn't provide details about the support needed for each language.

The Unicode Consortium made some attempts to address this issue, but it has so far not yielded results. CLDR now flags up a few scripts for which conjuncts are common.

Priority

Keeping conjuncts together is a pretty basic requirement. It is not possible to work around this problem.

That said, letter-spacing is not relied on for essential content authoring, therefore the priority was set to advanced.

Tests & results

Interactive test, When letter-spacing is applied to Devanagari the browser will not split conjuncts

Interactive test, When letter-spacing is applied to Bengali the browser will not split conjuncts

  • Gecko: ❌ Most of the half-form conjuncts (which is the large majority of all conjuncts) have space inserted between the glyphs that make up the conjunct (ie. not split into consonants with visible viramas). Vertically-combined glyphs tend not to be split.
  • Blink: ❌ Same as Gecko.
  • Webkit: ❌ Same as Gecko.

Action taken

tbd

Outcomes

tbd

@r12a
Copy link
Contributor Author

r12a commented Mar 31, 2021

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include:
BengaliDevanagariGujaratiTamil

@srl295
Copy link

srl295 commented Sep 26, 2023

Perhaps CLDR-2142 is relevant here?

@r12a r12a added l:hi Hindi, Devanagari script l:bn Bengali language & script l:ta Tamil language & script l:gu Gujurati language & script l:pa Punjabi, Gurmukhi script and removed l:pa Punjabi, Gurmukhi script labels May 1, 2024
@r12a r12a moved this to Spec ready, pending bug report in Gap-analysis pipeline Jun 20, 2024
@r12a r12a added s:gujr Gurajati script s:beng Bengali script labels Jul 2, 2024
@r12a r12a added s:deva Devanagari script s:taml Tamil script labels Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:beng doc:deva doc:gujr doc:taml gap i:spacing Text spacing l:bn Bengali language & script l:gu Gujurati language & script l:hi Hindi, Devanagari script l:ta Tamil language & script p:advanced s:beng Bengali script s:deva Devanagari script s:gujr Gurajati script s:taml Tamil script x:beng x:blink x:deva x:gecko x:gujr x:taml x:webkit
Projects
Status: Spec ready, pending bug report
Development

No branches or pull requests

2 participants