RustyBuzz-WASM

Table of Contents generated with DocToc

RustyBuzz-WASM

RustyBuzz-WASM

Introductory Notes

this code under development
this README still fragmentary
initial goal was to make text shaping as implemented by rustybuzz accessible from NodeJS
turns out rustybuzz depends on ttf-parser which offers access to glyf outlines, which is when text rendering (in SVG format) was added
then I found textwrap which implements optimized distribution of sized 'black boxes' (i.e. rectangular areas of arbitrary content) over stretches of equal length (i.e. lines of text); this solves the problem of wrapping text provided one knows where line break opportunities are (e.g. around whitespace, after a hard or soft hyphen, between CJK ideographs)
add text preparation and you have almost full typesetting (sans combination of styles so far, but we're getting there):
- first do hyphenation for each paragraph of text, which takes some text and some language settings and returns the same text with soft (a.k.a. discretionary, optional) hyphens (U+00AD Soft Hyphen) inserted at appropriate positions;
- next, apply Unicode UAX#14: Unicode Line Breaking Algorithm to the text; this will identify all the stretches of text that must be kept together in typesetting (here dubbed 'slabs', short for 'syllables')

⚠️ As it stands, this work will probably be incorporated into InterText; at any rate, seeing as the scope of the present module has been growing, rustybuzz-wasm is no longer a fully appropriate moniker.

⚠️ 🚧 TO BE DONE—For some details around code compilation and installation of this software see the installation guide.

What it Does

This module allows users to take a Unicode text and a path to a font file as inputs and obtain a list of GlyfIDs and 2D positions back. This process is known as text shaping. It is an indespensible ingredient for compositing text in so-called 'complex' writing systems like Arabic and Indic alphabets, but even when applied to text written in the Latin alphabet, there are finer points of typesetting like kerning and the choice of ligatures which makes this process too difficult to be reasonably implemented on-the-fly for each piece of software that uses text. Instead, what one wants is a specialized library that knows lots of details about font file formats, OpenType font features, type metrics and so on and applies that knowledge to a given text string to derive poisitioning data for the individual graphical pieces ('glyfs') that, when drawn out on a canvas (such as an HTML <canvas> or an <svg> element) then instruct the rendering software to render an aesthetically pleasing and orthographically correct (image of a) text. You can see all this in action in the live HarfBuzz demo page. If you want to know more about text shaping, be sure to read Ramsey Nasser's Unplain text: A primer on text shaping and rendering non-Latin text in the shadow of an ASCII-dominated world; also, you might want to take a look at the HarfBuzz terminology glossary.

The leading free software to provide text shaping is HarfBuzz (repo here), which is written in C++. rustybuzz is "is a complete harfbuzz's shaping algorithm port to Rust", and since it's written in Rust, we can compile it to WASM and write a nice API surface for it, which is what I did.

Samples

Sample in Arabic, using the Amiri Typeface to typeset "الخط الأمیری". Notice visible overlaps and tasteful placement of complex ligatures (which will for the most part not be present in the browser rendering of the same text unless you happen to configured a suitable font). Both texts generated from the exact same sequence of Unicode codepoints, ا, ل, خ, ط, ␣, ا, ل, أ, م, ی ر, ی (which starts with ا and ends with ی, notice RTL re-ordering by the browser). Also note that while the bounding boxes of the glyfs differ in their vertical placements, in this case that only reflects tthe different areas covered by the outlines; in the underlying SVG, the y attributes of all paths are set to 0 (i.e. all glyfs are still nominally sitting on the baseline).

Sample in Tibetan, using the Tibetan Machine Uni Typeface, to typeset ཨོཾ་མ་ཎི་པདྨེ་ཧཱུྃ (there is a certain chance even in 2021 that this piece of text will not be rendered correctly across systems and browsers). Again, a complex composition is made from a linear string of codepoints ཨ, ོ, ཾ, ་, མ, ་, ཎ, ི, ་, པ, ད, ྨ, ེ, ་, ཧ, ཱ, ུ. Notice that in this font, a choice has been made to precompose the stacked clusters ད, ྨ and ཧ, ཱ, ུ; this is a design choice which, were it not for a text shaper like rustybuzz, would cause a considerable amount of work for anyone striving to display Tibetan script correctly with this font and others whose choice of ligatures may be completely different.

What it Is

To implement rustybuzz-wasm I started with the example shipped with rustybuzz which compiles to an executable that accepts a path to a font file and a text and then echoes a containing glyf IDs and positioning data. This I turned into a minimalist version with WASM entry points. There's still a lot missing, especially font feature selection, but since everything went so well so far, I guess I'll get to that later.

How Does it Compare

rustybuzz-wasm is not feature-complete with rustybuzz, yet.
rustybuzz-wasm would appear to be 1.5 times faster than harfbuzzjs (which is what drives the HarfBuzz demo page]). harfbuzzjs does not allow arbitrarily long lines and does not support font features (which rustybuzz will probably soon have).
rustybuzz-wasm is over 3 times faster than using opentype.js.
HarfBuzz does have command line utilities, too (referred to as harfbuzzjs_shaping in the below benchmark results), but the fact that one has to open a sub-process for each piece of text and re-read font files damages performance a great deal. This means that rustybuzz-wasm (running as WASM attached to a NodeJS process) is over 12 times as performant as harfbuzz (using child processes over the command line). Note that this does not tell you how fast HarfBuzz itself is because secondary effects (overhead of one process per line of text, re-reading fonts) can be reasonably expected to dominate performance.

The benchmarks (source here) were done with 100 lines of text with 100 words on each line; counts represent Unicode code units (thus, approximately characters). "1,000 nspc" means "one thousand nanoseconds per cycle", a cycle being the unit of counting (roughly, one Unicode codepoint); here, lower figurs are better. The reciprocal value expressed in Hertz (cycles per send) tells you how many items you can expect to get through your chosen process, so higher numbers are better. The bar charts express relative performance with the top performer being pegged to 100%. Several runs were performed with randomized order of execution to minimize noise. The hardware is a 2015 customer grade, not fast, not new, not fancy laptop, so many machines will be considerably faster for all contestants.

rustybuzz_wasm_rusty_shaping   0.300 s   65,732 items   218,840⏶Hz     4,570⏷nspc
rustybuzz_wasm_json_shaping    0.368 s   65,732 items   178,605⏶Hz     5,599⏷nspc
rustybuzz_wasm_short_shaping   0.331 s   65,732 items   198,465⏶Hz     5,039⏷nspc
harfbuzzjs_shaping             0.373 s   65,732 items   176,392⏶Hz     5,669⏷nspc
opentypejs_shaping             0.928 s   65,732 items    70,815⏶Hz    14,121⏷nspc
fontkit_shaping                2.203 s   65,732 items    29,840⏶Hz    33,512⏷nspc
harfbuzz_shaping               3.745 s   65,732 items    17,553⏶Hz    56,971⏷nspc

rustybuzz_wasm_rusty_shaping     220,399 Hz   100.0 % │████████████▌│
rustybuzz_wasm_short_shaping     194,886 Hz    88.4 % │███████████  │
rustybuzz_wasm_json_shaping      180,277 Hz    81.8 % │██████████▎  │
harfbuzzjs_shaping               143,434 Hz    65.1 % │████████▏    │
opentypejs_shaping                65,468 Hz    29.7 % │███▊         │
fontkit_shaping                   29,605 Hz    13.4 % │█▋           │
harfbuzz_shaping                  17,153 Hz     7.8 % │█            │

⚠️ Caveats ⚠️

⚠️ Rust Newbie here so probably the code is not ideal in some respects.
⚠️ FTTB I have commited the WASM artefacts to the repo; since I'm still working on this you may happen to have ⛔️ downloaded some unoptimized code which is orders of magnitude slower than WASM resulting from optimized compilation ⛔️; therefore:
- Always re-build before trying out:
  - for faster compilation, do wasm-pack build --debug --target nodejs && trash pkg/.gitignore && node demo-nodejs-using-wasm/lib/main.js > /tmp/foo.svg
  - for faster execution, do wasm-pack build --target nodejs && trash pkg/.gitignore && node demo-nodejs-using-wasm/lib/main.js > /tmp/foo.svg
⚠️ Values are currently communicated as JSON and hex-encoded binary strings; this is probably not terribly efficient and may change in the future; see https://hacks.mozilla.org/2019/11/multi-value-all-the-wasm/ and https://docs.rs/serde-wasm-bindgen/0.1.3/serde_wasm_bindgen/.

Monospaced Typesetting

provided out-of-the-box by textwrap,
includes hyphenation, character width calculation
problem lies with Unicode UAX#11: East Asian Width (or its implementation in packages like string-width (JS) and unicode-width (Rust)) which report partially faulty lengths:
- abc: 3 units 💚
- 御門: 4 units 💚
- اَلْعَرَبِيَّةُ: 15 units ❌
- العربية: 7 units 💚
- ﷺ‎: 2 units ❌
- ﷻ‎: 2 units ❌
- ﷼‎: 2 units ❓
- ﷽: 1 units ❌❌❌
the better approach would seem to be to either monkey-fix widths known to be wrong or to do text shaping using carefully selected fonts (and quantize widths where they are not already quantized); in either case, one cannot simply use the solution provided by textwrap without landing a pull request first.
using this proposed method, monospaced typesetting does become more complicated, but on the other hand:
- where better speed is needed, one can still check texts for problematic characters, and, where needed, cache results
- monospaced typesetting becomes less of a special case and can be seamlessly integrated into the workflow of proportional typesetting, which is a huge advantage.

Command Lines

To build and test in dev (much faster, but also much slower)

wasm-pack build --debug --target nodejs && trash pkg/.gitignore && node demo-nodejs-using-wasm/lib/main.js

To build and test production:

wasm-pack build --target nodejs && trash pkg/.gitignore && node demo-nodejs-using-wasm/lib/main.js

API

1.) Persistent State

pub fn set_font_bytes( font_bytes_hex: String ) {—
pub fn has_font_bytes() -> bool { unsafe { !FONT_BYTES.is_empty() } }—

2.) Text Preparation

3.) Text Shaping

pub fn shape_text( user_cfg: &JsValue ) -> String {—

4.) Text Rendering

pub fn glyph_to_svg_pathdata( js_glyph_id: &JsValue ) -> String {—

5.) Line Breaking

pub fn wrap_text( text: String, width: usize ) -> String {—

From Typing Text to Typeset Text

'raw' text is encoded as a series of bytes (UTF-8)
a sequence of codepoints (positive integer numbers) intended to represent graphemes
hyphenate text (in languages that use hyphenation); this inserts soft hyphens (U+00ad)
find line break opportunities (LBOs); these occur, for example, after each space, each hyphen (hard or soft), each full stop and so on

Benchmarks

On my smallish, not new laptop, RustyBuzz-WASM's shape_text() method achieves speeds exceeding 280,000 glyfs (outlines) per second, around twenty times the speed attained by OpenTypeJS:

rustybuzz_wasm_rusty_shaping   286,779 Hz ≙ 1 ÷ 1.0    100.0 % │████████████▌│
rustybuzz_wasm_short_shaping   254,097 Hz ≙ 1 ÷ 1.1     88.6 % │███████████▏ │
rustybuzz_wasm_json_shaping    216,043 Hz ≙ 1 ÷ 1.3     75.3 % │█████████▍   │
opentypejs_shaping              61,997 Hz ≙ 1 ÷ 4.6     21.6 % │██▊          │
fontkit_shaping                 27,953 Hz ≙ 1 ÷ 10.3     9.7 % │█▎           │
harfbuzz_shaping                16,221 Hz ≙ 1 ÷ 17.7     5.7 % │▊            │

Note that in order to obtain this kind of performance, you absolutely must build for production as development builds will be much, much, much slower. Benchmarks created with textshaping.benchmarks.

🚧 To Do 🚧

find out what makes format rusty (which has quite a few options) so much faster than the minimalistic short format (which has no options); to do so, modify the (constant) format flags
implement OpenType font features
implement face selection
implement language selection?
implement script selection?
implement clustering selection?
write INSTALL.md

Also See

Rendering

ab-glyph—"When laying out glyphs into paragraph, ab_glyph is faster than rusttype using .ttf fonts & much faster for .otf fonts."
rusttype—A pure Rust alternative to libraries like FreeType
Fontdue—See below under Line Breaking / Text Wrapping.

Text Shaping

Allsorts—Allsorts is a font parser, shaping engine, and subsetter for OpenType, WOFF, and WOFF2 written entirely in Rust. It was extracted from Prince, a tool that typesets and lays out HTML and CSS documents into PDF.

The Allsorts shaping engine was developed in conjunction with a specification for OpenType shaping, which aims to specify OpenType font shaping behaviour.
OpenType shaping documents

Line Breaking / Text Wrapping

newbreak—written in JS/TS, has tentative Rust implementation; 🛑 TS fails to compile to JS; last commits in Summer 2020 so maybe abandoned.
fontdue—"Fontdue is a simple, no_std (does not use the standard library for portability), pure Rust, TrueType (.ttf/.ttc) & OpenType (.otf) font rasterizer and layout tool. It strives to make interacting with fonts as fast as possible, and currently has the lowest end to end latency for a font rasterizer".—Written in Rust, aims to be font rasterizer including text wrapping, but sadly 🛑 fails to compile although I could hotfix that.
kas-text looks enticing but is a huge thing geared towards building GUI apps. 🛑 It uses the original HarfBuzz C libraries so I rather not touch this thing as C dependencies will always be cans of worms.

To Do

[+] update dependencies (3.3.0 -> 4.3.0):
- Updating libc v0.2.104 -> v0.2.107
- Updating proc-macro2 v1.0.30 -> v1.0.32
- Updating rustybuzz v0.3.0 -> v0.4.0
- Updating rustybuzz-wasm v3.3.0 (/home/flow/jzr/rustybuzz-wasm) -> v4.3.0
- Updating serde_json v1.0.68 -> v1.0.69
- Updating syn v1.0.80 -> v1.0.81
- Updating ttf-parser v0.9.0 -> v0.12.3
- Updating unicode-general-category v0.2.0 -> v0.4.0
[+] implement ad.nobr attribute to signal where breaking glyfs is unsafe
[+] set ads.br: 'end' to avoid spurious line break, loss of rest-of-line
[+] fix endless loop, spurious repeated hyphens in distribution
[+] recover myteriously missing first glyf on line after break in sample missing-t-b42

Name		Name	Last commit message	Last commit date
Latest commit History 284 Commits
artwork		artwork
demo-nodejs-using-wasm		demo-nodejs-using-wasm
pkg		pkg
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
fonts		fonts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RustyBuzz-WASM

RustyBuzz-WASM

Introductory Notes

What it Does

Samples

What it Is

How Does it Compare

⚠️ Caveats ⚠️

Monospaced Typesetting

Command Lines

API

1.) Persistent State

2.) Text Preparation

3.) Text Shaping

4.) Text Rendering

5.) Line Breaking

From Typing Text to Typeset Text

Benchmarks

🚧 To Do 🚧

Also See

Rendering

Text Shaping

Line Breaking / Text Wrapping

To Do

About

Releases

Packages

Languages

License

loveencounterflow/rustybuzz-wasm

Folders and files

Latest commit

History

Repository files navigation

RustyBuzz-WASM

RustyBuzz-WASM

Introductory Notes

What it Does

Samples

What it Is

How Does it Compare

⚠️ Caveats ⚠️

Monospaced Typesetting

Command Lines

API

1.) Persistent State

2.) Text Preparation

3.) Text Shaping

4.) Text Rendering

5.) Line Breaking

From Typing Text to Typeset Text

Benchmarks

🚧 To Do 🚧

Also See

Rendering

Text Shaping

Line Breaking / Text Wrapping

To Do

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages