Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ligation without ZWJ and VS16 #11

Open
Crissov opened this issue Oct 17, 2016 · 5 comments
Open

Ligation without ZWJ and VS16 #11

Crissov opened this issue Oct 17, 2016 · 5 comments

Comments

@Crissov
Copy link
Contributor

Crissov commented Oct 17, 2016

With standardized emoji sequences, the author is responsible for the correct order of emoji characters, possibly mandatory variation selectors and zero-width joiners. For most cases and emoji input GUIs, this just works, though. If these sequences are not handled on the system level, Opentype fonts probably employ the rlig ‘required ligatures’ feature which is enabled (or enforced) by default.

It may be useful sometimes, for a reader or designer to control (additional) ligation. This would be handled by the liga or clig ‘contextual ligatures’ OTF features which can be enabled on demand.
Possible use cases: ligature aliases, e.g. Woman+Man+Child = Man+Woman+Child = Child+Man+Woman = Woman+Child+Man …, and non-standard ligatures, e.g. Police Car 🚓 + Woman 👩 = Police Woman = Police Officer 👮 + Female Sign ♀️.
#5 should be fixed first, of course.

@13rac1
Copy link
Owner

13rac1 commented Oct 17, 2016

Interesting! How do you imagine it working? Options in the YAML?

@Crissov
Copy link
Contributor Author

Crissov commented Oct 18, 2016

.fea files maybe, which can be reused if glyph names are the same.

@13rac1
Copy link
Owner

13rac1 commented Oct 18, 2016

Hmm... .fea files are the sort of thing I'd rather leave to other tools such as afdko. My goal is a tool a graphic artist can understand without learning all about fonts. I know that's blasphemy as far as many people are concerned... Haha!

There's gotta be a way to support more advanced features without re-implementing afdko?

@Crissov
Copy link
Contributor Author

Crissov commented Oct 19, 2016

Since this would need only a very limited subset of Adobe’s .fea syntax (which Fontforge also supports), it could probably be done in the YAML as well, so the tool could build a (virtual) features file which it can then feed to the libs.

sub left-glyph right-glyph by ligature-glyph;

OTF substitution is not based on code-points but glyph names. I haven’t checked yet whether fonts generated by scfbuild and system emoji fonts use (perhaps even the same) systematic glyph names based upon code-points (as suggested by Adobe: uni…). The source image files are usually named by code-point(s) (at least in Emojione, Twemoji and – differently – Noto Emoji, but not Emojidex) and the directory name determines their kind (monochrome or colorful, SVG or PNG, size). That means, that scfbuild is able to hide the distinction between files, glyphs and code-points from its users (graphic artists or other) in many cases, but it needs the mapping information somewhere. Keeping it in the file name probably lead to #5 and the solution to it should be very similar to the solution of this #11. If possible within YAML, I suggest a simple syntax employing -> or =:

alias -> original
alias -> original -> #code-point
left right -> ligature
custom-file-name -> glyph-name
custom-file-name -> #code-point

@Crissov
Copy link
Contributor Author

Crissov commented Dec 4, 2016

The place to fix this seems to be codepoint_from_filepath() in util.py. It currently only supports file names that are either a single hexadecimal code point or a string of multiple code points oncatenated by hyphens. This format is used by Emojione and Twemoji.

codepoints[] = re.match("([\da-f]{4,5})(?:-([\da-f]{4,5}))*", filename) # Emojione, Twemoji

Google Noto uses a more verbose naming convention, including a prefixed emoji_u and an underscore as glue character.

codepoints[] = re.match("emoji(?:_u?([\da-f]{4,5}))+", filename) # Noto

These code points can then be converted to de-facto standard Adobe glyph names and a cmap table can be build accordingly.

Emojidex, however, uses descriptive file names which can be used as glyph names, but require some kind of lookup or heuristics to correctly match these with Unicode code points. They could be hard-coded, e.g. based upon short names and character names or annotation, but a user-defined map as suggested above with custom-file-name -> #code-point is probably the better approach . Emojidex also contains animated glyphs whose frames reside in a sub-folder – following the same naming scheme as .svg files – together with an animation.json. It also features non-standard emojis that would require a custom mapping to PUA codes or ligatures anyway.

 (glyphname, variant) = re.match("([A-Za-z0-9_]+)(\([a-z]+\))?", filename) # Emojidex

Pseudo-ligatures with variation selectors VS-15/TVS U+FE0E and VS-16/EVS U+FE0F could be added automatically based upon current conventions documented in UTR#51 and custom extensions.

I don’t know anything about Python, so treat the above as pseudo-code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants