Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining characters rendering in wrong place #2384

Open
claysmalley opened this issue Apr 12, 2023 · 7 comments
Open

Combining characters rendering in wrong place #2384

claysmalley opened this issue Apr 12, 2023 · 7 comments
Labels
bug Something isn't working PR is more than welcomed Extra attention is needed

Comments

@claysmalley
Copy link

claysmalley commented Apr 12, 2023

maplibre-gl-js version: 2.4.0

In some scripts, such as Latin and Thai, combining characters never change the width of the preceding character to which they attach. Such combining characters are capable of being rendered properly without a full implementation of glyph substitution, as is needed for Indic scripts (see #50). Diacritics in these scripts look best when centered above or below characters.

In Maplibre GL JS, many combining characters are being rendered off to the side from where they should be. Combining characters are being rendered typewriter-style (the naïve "one codepoint = one glyph" implementation), as zero-width characters that place a diacritic in a fixed position about the previous letter. However, Latin letters (for example) have variable widths, so the same combining diacritic might not be centered over i and w in this rendering method.

This is especially noticeable when text-letter-spacing is set to a value greater than 0. In this case, extra spacing is erroneously inserted between combining characters and their preceding characters. Diacritics are then placed over this extra spacing instead, causing them to appear off-center.

Example 1

https://americanamap.org/#map=14.81/49.02432/-123.10257&language=mul

A park in Tsawwassen, British Columbia, Canada, has the name sməqʷəʔelə həw̓aləm̓ew̓txʷ, posted alongside its English name "Blue Heron Nest Park". The Halkomelem language has a plethora of consonants, many of which are represented by a Latin letter with a comma above. Since these consonants are not available in Unicode as precomposed characters, this language makes frequent use of the codepoint U+0313 COMBINING COMMA ABOVE.

In and , MapLibre places the comma above the right edge of the character instead of the center:

image

Example 2

https://americanamap.org/#map=2.66/-12.43/71.09&language=th

The Thai name for the Indian Ocean is มหาสมุทรอินเดีย. The below example is in Noto Sans Thai, which may have a dramatically different style than what displays inline here, but it looks like correct Thai:

Screenshot from 2023-04-12 09-58-17

However, when text-letter-spacing > 0, combining characters introduce double spacing between neighboring characters, and render awkwardly to the side of the preceding character. Note that there are no spaces in this name—Thai uses spaces to separate sentences, not words:

Screenshot from 2023-04-11 21-16-39

@1ec5
Copy link
Contributor

1ec5 commented Apr 12, 2023

Combining characters are being rendered typewriter-style (the naïve "one codepoint = one glyph" implementation), as zero-width characters that place a diacritic in a fixed position about the previous letter. However, Latin letters (for example) have variable widths, so the same combining diacritic might not be centered over i and w in this rendering method.

In Americana’s font stack, is the advance measure for the glyph at index 787 a negative number equivalent to −1 ch? If so, I would expect the code below to at least center the diacritic over a medium-width preceding letter such as “c”. This would still be inaccurate for wider letters like “w” and narrower letters like “i”, but better than what these screenshots show.

const positions = glyphMap[section.fontStack];
const glyph = positions && positions[codePoint];
if (!glyph) return 0;
return glyph.metrics.advance * section.scale + spacing;

@ZeLonewolf
Copy link
Contributor

In Americana’s font stack, is the advance measure for the glyph at index 787 a negative number equivalent to −1 ch?

Sounds like we'll need to investigate this. The fonts are created using Font.create(...) in fonteditor-core.

Specifically, this is done here:
https://github.com/ZeLonewolf/openstreetmap-americana/blob/a924c8c62f0adf21dd2d2ead63a9c58935b266b5/scripts/fonts.js#L109-L116

We'll need to investigate those font creation options. I don't fully understand what it's doing so there may very well be a bug in how we're creating the fontstacks.

@ZeLonewolf
Copy link
Contributor

Below is the font object for glyph 787 in the bold fontstack. This gets written to a TTF which is then consumed by build_pbf_glyphs:

[
  {
    "xMin": -30,
    "yMin": 480,
    "xMax": 174,
    "yMax": 731,
    "unicode": [
      787
    ],
    "advanceWidth": 0,
    "leftSideBearing": -30,
    "name": "uni0313",
    "contours": [
      [
        {
          "x": -30,
          "y": 480,
          "onCurve": true
        },
        {
          "x": -3,
          "y": 541
        },
        {
          "x": 54,
          "y": 678
        },
        {
          "x": 72,
          "y": 731,
          "onCurve": true
        },
        {
          "x": 170,
          "y": 731,
          "onCurve": true
        },
        {
          "x": 174,
          "y": 720,
          "onCurve": true
        },
        {
          "x": 150,
          "y": 666
        },
        {
          "x": 76,
          "y": 537
        },
        {
          "x": 39,
          "y": 480,
          "onCurve": true
        }
      ]
    ]
  }
]

@HarelM HarelM added bug Something isn't working PR is more than welcomed Extra attention is needed labels Apr 13, 2023
@HarelM
Copy link
Collaborator

HarelM commented Apr 13, 2023

Can you elaborate on the difference between this bug and #50? Will solving #50 solve this issue as well?

@claysmalley
Copy link
Author

This is specifically about scripts where combining characters don't affect the base character's width, which excludes most Indic scripts. #50 is the general case, and solving that will indeed solve this.

This issue potentially has an easier solution than #50, but if that is fixed first, this one is moot.

@1ec5
Copy link
Contributor

1ec5 commented Apr 13, 2023

This is actually two, maybe three distinct issues:

In m̓ and w̓, MapLibre places the comma above the right edge of the character instead of the center:

Looking at #2384 (comment), I think I was a bit off: advanceWidth is 0, which is unsurprising, because a combining character is non-spacing. Instead, the glyph specifies a negative xMin and leftSideBearing, but apparently the magnitude is only enough to fit the glyph contour; unlike in some fonts, the glyph doesn’t hard-code the width of the preceding character.

This is correct, but it means the glyph PBF doesn’t contain enough information to infer that the character should be centered over the previous character. Either the glyph metrics fields would need to be extended, or we’d need some sort of heuristic that a negative left, positive width, and zero advance indicates a combining character that should be centered. This heuristic would work in the examples above, but some combining characters are supposed to be offset or centered over more than one character.

Alternatively, GL JS could hard-code the set of codepoints that are known to be combining characters, based on the Unicode standard. Maybe there are granular properties about how characters combine. GL JS already hard-codes some other properties about codepoints, such as whether they allow breaking or can be rotated in vertical text, so there is certainly precedent. Maybe this is what text layout engines are expected to do for combining characters too?

During shaping, the following code assumes the glyph begins at zero, with no affordance for combining characters:

positionedGlyphs.push({glyph: codePoint, imageName, x, y: y + baselineOffset, vertical, scale: section.scale, fontStack: section.fontStack, sectionIndex, metrics, rect});
x += metrics.advance * section.scale + spacing;

The glyph PBF does contain the left metric, which is being ignored here. This code might as well retreat the line by left. As a side benefit, GL JS would get optical margin alignment for free, which would be pretty nice for multiline text labels.

However, when text-letter-spacing > 0, combining characters introduce double spacing between neighboring characters, and render awkwardly to the side of the preceding character. Note that there are no spaces in this name—Thai uses spaces to separate sentences, not words:

This code I linked in #2384 (comment) is the culprit: it assumes it should advance by the spacing even if the character is zero-width. This would also produce unexpected spacing for an invisible character such as a ZWNJ:

const positions = glyphMap[section.fontStack];
const glyph = positions && positions[codePoint];
if (!glyph) return 0;
return glyph.metrics.advance * section.scale + spacing;

This part is more straightforward to fix: if the glyph’s advance is 0, the line needs to retreat by spacing, since it already advanced by that distance for the previous character.

@1ec5
Copy link
Contributor

1ec5 commented Apr 29, 2023

#2388 attempts to work around the problem with hard-coded metrics for combining characters, though it looks like it might be possible to obtain more robust metrics from the font.

maplibre/maplibre-style-spec#145 (reply in thread) has a proof of concept of first-class support for combining text by properly segmenting the text by grapheme cluster and sending all text, including Latin, through the alternative TinySDF codepath, avoiding the manual shaping code above altogether. This approach would correctly handle any Latin combining diacritics, though ligatures would remain unsupported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PR is more than welcomed Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants