Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable embedding of OpenType fonts #11

Open
wants to merge 35 commits into
base: master
Choose a base branch
from

Conversation

jbowtie
Copy link

@jbowtie jbowtie commented Feb 13, 2017

This is a fairly large pull request that enables the embedding of OpenType fonts, along with basic support for kerning and ligatures. Future pull requests will refine this functionality.

There are also a tiny number of correctness fixes.

This low-level functionality is needed for more advanced typography and layout.

The API now allows you register a font with an alias. The font is embedded when the PDF is written out.
The tests directory now has a few OFL-licensed fonts for testing purposes.

Enabled OpenType features get applied. At the moment only standard ligatures and pair positioning for default latin script have been fully implemented and tested.

The immediate future work involves;

  • Enhancing the API so OpenType features can be turned off and on. (DONE)
  • Implementing the parsing and application of the full GPOS and GSUB tables.
  • Falling back to the "kern" table for basic kerning.
  • Enabling a wider set of features by default. (DONE)
  • Handling fonts that use AAT for layout.

Known issues:

  • Times out when embedding large fonts (most CJK fonts, for instance). This will be fixed when subsetting is properly supported. (FIXED performance-wise, though subsetting is not supported properly yet).
  • Some readers corrupt the copy/paste information. This is probably an issue with serializing the ToUnicode map.

@kpandya3
Copy link

Hey @jbowtie, thanks for submitting the PR! We will review it by this weekend and get back to you.

@tyre
Copy link
Owner

tyre commented Feb 18, 2017

I've started reviewing this and will be doing some refactoring as I go. @jbowtie how happy are you with the test coverage for this feature?

@tyre
Copy link
Owner

tyre commented Feb 18, 2017

@jbowtie could you please enable collaboration on the PR? Then we can all work together on this :)

https://github.com/blog/2247-improving-collaboration-with-forks

@jbowtie
Copy link
Author

jbowtie commented Feb 18, 2017

@tyre collaboration is enabled. I don't know that there's much value in increasing test coverage until more of the GSUB/GPOS features are implemented, but I'll happily maintain any tests that get added.

@jbowtie
Copy link
Author

jbowtie commented Mar 5, 2017

@tyre Is there any progress on reviewing/refactoring this? I appreciate that it's a LOT to take in -- I'm happy to provide orientation and/or discuss refactoring in the comments here.

@tyre
Copy link
Owner

tyre commented Mar 18, 2017

@jbowtie Working on it this weekend! Have gotten the basics worked out and refactored; now doing some of the auxiliary headers. I want to make sure we have something that can reasonably scale to other font types

@jbowtie
Copy link
Author

jbowtie commented Apr 10, 2017

I've now implemented a large enough chunk of font processing to better express a reasonable workflow and division of responsibilities.

The script can be autodetected via Unicode properties, or this (along with an optional language) can be specified.

A script-specific shaper can tag individual characters with OpenType properties -- by default this would be detecting fraction numerator/denominator if 'frac' feature is active, positional shaping (initial/medial/final/isolated) for scripts such as Arabic, marking leftmost/rightmost characters if optical bounds feature is active, etc.

Now we do font-specific things:

Characters get converted to initial glyphs one-for-one via font cmap -- tags get transferred one-for-one.

Glyph substitution takes place according to OpenType rules -- this takes care of things like positional shaping, ligatures, mirroring, contextual replacements, etc. If there's no GSUB table this does nothing.

Positions are initialized to the advance width declared in the font metrics.

Glyph positioning takes place according to OpenType rules -- this takes care of mark placement, kerning, cursive alignment, etc. If there's no GPOS table, do kerning (using 'kern' table) or nothing.

At this point we have some output glyphs and their final positioning data. This can be tested for correctness! HarfBuzz tests lots of complex script layout scenarios against various fonts for instance.

When writing out to PDF, we need to scale the font metrics so they match the 1000-per-em metrics of PDF standard. Same with positioning.

This pull request just positions individual glyphs. Ideally you do that only for the subset that requires it; elsewhere you use the TJ syntax if you have kerning adjustment or omit positioning entirely (relying on the standard glyph widths).

So what does that imply?

The layout_text function needs to know:

  • font
  • active OpenType features (can have default based on spec)
  • script (can be autodetected by some unicode module or macro)
  • language (can default to 'dflt' per spec)
  • optional width/height constraints (if we ever implement line wrapping/alignment)

Parser module responsible for converting the font binary to something useful.
Positioning module and Substitution module responsible for their respective areas (calling out to the parser as needed).
Shaping module responsible for contextual analysis and tagging of characters. I believe AAT fonts have a built in shaper.
Unicode module for access to unicode database properties (script detection, joining type, etc)
PdfEmbedding module to handle writing out required objects
Text module needs to be smart enough to select correct PDF operators based on positioning type. Ideally we also write out /ActualText (needed for complex scripts that re-order characters but also handy when more advanced layout otherwise confuses a PDF reader).

@jbowtie
Copy link
Author

jbowtie commented May 10, 2017

Since @tyre hasn't been in a position to push his refactoring and I've made substantial progress on my arabic branch, I'll begin pushing a series of smalller PRs based on my personal refactoring.

I've created #12 and #13 to address the CI build and byte_size bug respectively. Once those are merged I'll have a solid basis for a series of much easier-to-review pull requests to flesh out the functionality.

@whossname
Copy link

@jbowtie has there been any progress recently? Based on this conversation it looks like this project is dead. This is a shame, I was thinking about implementing tables using this and if the resulting code was useful, offering to add it to the code base.

@jbowtie
Copy link
Author

jbowtie commented Jun 24, 2017

@whossname I don't know how active @tyre is as a maintainer. I have a fork I'm maintaining in the meantime -- the default opentype branch is where I'm merging the smaller pull requests as I produce them and the arabic branch is a more advanced (and more correct) implementation of this pull request.

@whossname
Copy link

Ok, I'm mostly interested in refactoring and adding to the Geometry stuff. When I get to this (still a month from when I need it) I might look at contributing to your fork instead of this one seeing as yours is being actively worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants