Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more writing systems using HarfBuzz #706

Closed
wipfli opened this issue Jan 18, 2023 · 17 comments
Closed

Support more writing systems using HarfBuzz #706

wipfli opened this issue Jan 18, 2023 · 17 comments
Labels
enhancement New feature or request

Comments

@wipfli
Copy link
Contributor

wipfli commented Jan 18, 2023

MapLibre GL Native currently does not support writing systems like for example Indic scripts or Khmer.

I think we should support more writing systems such that people from all parts of the world can use MapLibre...

Let's use this as a tracking issue to collect ideas and material how we can extend writing system support. Maybe Harfbuzz will be the best tool.

@wipfli
Copy link
Contributor Author

wipfli commented Jan 18, 2023

If only I knew which part of the code base was responsible for text rendering...

@alanchenboy
Copy link
Collaborator

GrabMap Khmer Render solution.pdf
This is the slide how grabmap Implement harfbuzz + freetype.

@brawer
Copy link

brawer commented Jan 21, 2023

Here’s my personal advice on this. (Disclaimer: While I’ve spent quite many years working on maps, internationalization, fonts, and text rendering, I’m a complete newbie to MapLibre. Please apologize for anything that doesn’t make sense in the MapLibre context).

Generally, GrabMap’s approach for Khmer goes in the right direction, but it’s incomplete, and you don’t need a separate hack for each language. Instead, once you’ve implemented text rendering correctly, a single code path can display all text in any language. This will also improve the display of Latin/English text in terms of kerning, ligatures, typographic features, and a big chunk of what’s needed for Emoji. Unfortunately, getting this right is fairly non-trivial, and also not exactly well documented. However, there’s several open-source implementations that might serve as inspiration. Personally, I’d mainly recommend looking at the source code of Chromium and Firefox; they both have state-of-the-art text rendering stacks. Another source of inspiration might be the Pango library used by GNOME/GTK and Qt, although with some reservations.

Here’s how modern text rendering works from a high level:

  1. Start by running the Unicode Bidirectional Algorithm. This results in a sequence of “bidi runs”. A good implementation is GNU FriBiDi, licensed under LGPL-2.1.
  2. Split the bidi runs into “script runs”, which are contiguous character sequences in the same writing system (and writing direction, as per step 1). This step is called “script itemization”, and there’s some subtleties. Some are documented in Annex 24, but this is not the full truth. There’s an implementation in the Win32 API, and one in Qt although one should double-check how good that is. It would be best to look at the Chromium and Firefox sources.
  3. Send each script run (with correct bidi and language tags) through HarfBuzz, using the default font for the chosen style. This step is called “shaping”, and results in “glyph vectors”, a sequence of (glyph ID, x, y).
  4. If the font hasn’t been able to display some pieces of text, the glyph vector will contain glyphs with glyph ID zero. In this case, find the failing substring, and recursively call HarfBuzz to shape the failing substring with a fallback font. Finding fallback fonts is non-trivial, performance-critical, and platform-specific. Again, have a look at the Chromium and Firefox sources.
  5. Break the labels into lines. Line-breaking (and hyphenation, which might be quite useful for maps) is non-trivial, language-specific, and performance-critical. Unicode defines an algorithm in Annex 14 but again, this isn’t the full story, so check out what web browsers do. It might be worth supporting soft hyphens, so a server-side tile renderer could use large, language-specific dictionaries (like those used by TeX) to find potential hyphenation points, and pass them through vector tiles to MapLibre.
  6. Render the glyphs. Given that MapLibre already has GPU rendering with Signed Distance Fields, GrabMap’s use of FreeType seemed a little surprising to me; there’s no obvious reason why SDF wouldn’t for for Khmer, Devanagari, or any other script in Unicode. However, if MapLibre wants to eventually support color fonts or color Emoji, there’ll be some complications here.

Language tags: For good text rendering, you’ll actually need to know the language of each label being rendered, and pass it down the rendering stack (into HarfBuzz) as an IETF BCP-47 language tag. This is the same language code that’s also used for language tagging in HTML, XML, and other data formats; modern browsers use it to tweak text rendering. Knowing the language mainly matters for East Asia, where certain glyphs should look slightly differently depending on the language (and region/country, which is part of IETF language tags). For example, this picture illustrates how the same Unicode codepoint U+8FD4 should look for various languages/regions. Knowing the language can also make a visible difference when rendering certain minority languages in South-East Asia such as Shan or Mon, and even for European languages like Polish, but these cases are admittedly rather high-end typography. Unfortunately, the MapBox Vector Tile format does not encode the language of labels. In the long term, I’d recommend extending the MVT format so that a tile renderer can encode the language. (In case extending MVT is too complicated: Unicode had once defined an escape mechanism that encodes language tags as special codepoints within a character stream. While Unicode has deprecated and strongly discouraged this escape mechanism, it might be still appropriate for MapLibre if the MVT format can’t be extended). For the medium term, I’d recommend implementing a client-side heuristics in MapLibre, at least for East Asia. That heuristic might also be useful when a tile doesn’t come with language tags, or when rendering other formats such as GeoJSON. In the short term, my recommendation would be to do nothing: Users will still be able to read the text, but they may complain about the “wrong font” being used. Knowing (or guessing) the language of each rendered label will also be needed if MapLibre ever wants to compute hyphenation points on the client side.

Color and variable fonts: Supporting color (and variable) fonts is a little complicated. But certainly doable, and I think both color and variations could have very nice applications for cartography. But it’s clearly less important than making text readable in the first place.

Web fonts: According to their slides, GrabMap seems to load Noto Sans Khmer over the web. However, on most modern devices, this wouldn’t actually be necessary; both Apple and Google bundle most of Noto with their operating systems. Although Apple hides the presence of Noto from its user interfaces, apps can still access the glyphs. Likewise, Microsoft Windows bundles a lot of international fonts. Said that, it certainly would make sense for MapLibre to support web fonts, both for custom styling and as a fallback when device fonts don’t cover the Unicode range needed for display. My recommendation would be to re-implement MapLibre’s text stack so it supports web fonts in the same way as Chromium and Firefox. However, this would likely be a sizeable chunk of work.

Font formats: Maybe I’m missing something here, but to me personally, the Mapbox font API seems a little weird. Again, my recommendation would be to make MapLibre behave like a modern web browser, support the same (standard) web font formats, and perform the conversion from Bézier curves to GPU-renderable Signed Distance Fields on the client device.

Styles: To define the style of map labels, my suggestion would be to implement Text and Fonts of CSS3, just like a modern web browser. This would be quite a bit of work, though.

@brawer
Copy link

brawer commented Jan 23, 2023

Worth reading: Text layout is a loose hierarchy of segmentation. There’s been several attempts to bundle text layout into a single library, such as Raqm, Minikin, ICU Paragraph Layout, or Cobbletext. Personally, I’d recommend looking at them for inspiration, although I’m not sure if they really fit MapLibre. Another source to consider might be lib/ui/text in Flutter: this is a fork of Chrome’s text handling, and less entangled than Chrome, but it’s deeply integrated with Flutter so not directly usable for MapLibre. On the other hand, if you just want to fix rendering quickly with little work, and if you don’t care about line breaking, hyphenation, and rendering text on GPUs, Raqm might be a good solution. Minikin implements line breaking, but being part of Android, it would need to be ported to other platforms.

@maxammann
Copy link
Collaborator

Thanks for the great summary, I'll check it out later in more detail!

A thing I was wondering regarding the Glyph rendering aspect: Does maplibre really need resolution independent glyph rendering?

In Maplibre the glyphs "exist" within the 3D world. That means glyphs need to be resized depending on the zoom.

Wouldn't it be enought for the major usecases to render glyphs with a static resolution as an overlay?

@brawer
Copy link

brawer commented Jan 23, 2023

Does maplibre really need resolution independent glyph rendering? In Maplibre the glyphs "exist" within the 3D world. That means glyphs need to be resized depending on the zoom. Wouldn't it be enought for the major usecases to render glyphs with a static resolution as an overlay?

Personally I’d find it a nice feature if text were able to grow and shrink with the rest of the map. Zooming would feel smoother that way, especially on deep zoom levels. But admittedly that’s pretty far off; the current user experience doesn’t seem to need this. Also, you can always rasterize glyphs on the CPU (by calling FreeType) to any desired resolution; it just won’t feel as smooth as when doing it on GPU.

@maxammann
Copy link
Collaborator

Personally I’d find it a nice feature if text were able to grow and shrink with the rest of the map. Zooming would feel smoother that way, especially on deep zoom levels. But admittedly that’s pretty far off; the current user experience doesn’t seem to need this. Also, you can always rasterize glyphs on the CPU (by calling FreeType) to any desired resolution; it just won’t feel as smooth as when doing it on GPU.

That is also my feeling. Yeah it would be cool to have resolution independant glphy rendering. But at the same time I'm wondering if we really need it. I honestly don't know right now why MapBox when that way originally. There must be some reason why it's neccassary.

@louwers louwers added the enhancement New feature or request label Jan 23, 2023
@ramSeraph
Copy link

ramSeraph commented Jan 23, 2023

I just want to add one more library for consideration w.r.t client side text breaking - https://github.com/unicode-org/icu4x

This one is written especially for cases like maplibre.

Edit:
I thought this was an issue in maplibre-gl-js. This might not be required for maplibre-gl-native.

@wipfli
Copy link
Contributor Author

wipfli commented Jan 28, 2023

Unfortunately, the MapBox Vector Tile format does not encode the language of labels. In the long term, I’d recommend extending the MVT format so that a tile renderer can encode the language.

@brawer feel free to add this point to the discussion at nyurik/future-mvt#1

@wipfli
Copy link
Contributor Author

wipfli commented Jan 29, 2023

We are not the first ones to think about supporting more writing systems. Here are some Mapbox issues:

@wipfli
Copy link
Contributor Author

wipfli commented Feb 4, 2023

Almost 10 years ago, in the third issue created in the Mapbox GL JS repo, people talked about HarfBuzz mapbox/mapbox-gl-js#3.

@maxammann
Copy link
Collaborator

Almost 10 years ago, in the third issue created in the Mapbox GL JS repo, people talked about HarfBuzz mapbox/mapbox-gl-js#3.

Uuuh doesn't that issue suggest that they already used freetype/ICU lib?

@1ec5
Copy link
Contributor

1ec5 commented Feb 5, 2023

Also lots of discussion in mapbox/DEPRECATED-mapbox-gl#4.

@ramSeraph
Copy link

ramSeraph commented Feb 5, 2023

My set of reading material/potential repos.. I hope it helps and is not an overload of data :)

EDIT: Cleaned up and recategorized

@maxammann
Copy link
Collaborator

@ramSeraph Thanks for that collection. Do you mind if I include that in https://maplibre.org/maplibre-rs/book/development-documents/font-rendering.html?

@ramSeraph
Copy link

ramSeraph commented Feb 5, 2023

@maxammann you can definitely include them.

I wasn't sure if I should put effort into maplibre-rs or here. I wasn't sure how far from production maplibre-rs was and I don't know rust( that can be remedied though :) )

I have to say, it was heartwarming to see the top priority of this issue at maplibre-rs - maplibre/maplibre-rs#36 (comment)

Is there a place where I can add more of the rust text util research? ( I can see that you already have looked at a few things I have )

@ramSeraph
Copy link

Unfortunately, the MapBox Vector Tile format does not encode the language of labels. In the long term, I’d recommend extending the MVT format so that a tile renderer can encode the language.

@brawer feel free to add this point to the discussion at nyurik/future-mvt#1

I wonder if this should be part of the maplibre style spec or MVT specification.

Also, please consider dropping the glyph api from the maplibre style spec if possible. The alternative mentioned in this issue tracker seems like a good idea.

maptiler/tileserver-gl#641 (comment)

@maplibre maplibre locked and limited conversation to collaborators Feb 8, 2023
@wipfli wipfli converted this issue into discussion #778 Feb 8, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants