-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt a better parsing/encoding strategy in OpenType.js #152
Comments
I've been using my restructure project for this. It supports both decoding and encoding all sorts of data types, and as a result of being developed for fontkit, it works pretty well for the OpenType format. 😜 |
I like the fact you seem to provide much more documentation than we get on Pomax's project. I am also glad to see the description of the LazyArray, which is something I was expecting to see implemented in whatever solution we choose for performance reasons. Also, it is nice to see you're working with require.js in a way that makes your solution more likely to be easily reusable as a dependency of other projects. |
This looks good: But it seems not all opentype tables are supported yet? |
The tables are here |
I don't see a |
They are, just in a different folder: https://github.com/devongovett/fontkit/tree/master/src/cff |
Nice! |
To be fair: my project was a PoC that I never got enough time for to properly redo (there's a lot of ES6/ES2015 that can be sprinkled in to drastically increase the efficacy of a spec based approach), so the lack of docs is really because it's a lack of "this is done in any way, shape, or form" =) (iirc there wasn't even all that much in the way of the Common Layout tables like GSUB/GPOS) The downside of my project was also that the parser it generated would load everything as a structured object, which is absolutely dreadful when you need efficient font traversal, where you just want a memory map in which you follow offset pointers, accumulating only exactly as much data as is necessary to shape the requested string. You can cache some things on the way, but ideally, that cache is cleared after the shaping run to give the system as much free memory as possible. |
@Pomax have you already looked at @devongovett 's work on Restructure ? Would you vouch for it being used on OpenType.js instead of your binary-parser-generator ? |
I hadn't seen restructure yet, it's now on my "to have a look at" list although I probably won't get to that until later in the week |
I have been thinking about this for about a year now. Really like Restruscture and was going to talk to Devon about it when I get back to California. In my vision, there should be one spec of font formats that is maintained centrally, and used to generate data / code for each language (C, Python, JS). But in the end this is so much work for no added benefit, so I won't personally be working on it. But really like that others are interested in it and like to be involved in the design. |
Note that compiling GSUB /GPOS needs table sharing logic. Devon, do you implement that? |
I haven't tried compiling the GSUB/GPOS tables yet, but they should work since they are defined using types from restructure. As for table sharing logic, restructure doesn't currently have a way for two different pointers to point to the same place. |
Basically we would need to calculate a checksum for each subtable and then optimize the serialization when checksums are equal. Or maybe some other procedure equivalent to that. |
I guess that encoding the tables without this optimization would still produce valid OTF files. It's just that their size would be larger, which is undesirable. But I don't see programs being unable to handle reading the unoptimized tables. So it seems to be an issue of size optimization, instead of a matter of compliance to the file format spec. |
OK. I just registered an issue about that: |
Another good thing about restructure is that it's got lots of mocha tests (a total of 234). |
So... after spending the afternoon inspecting the source code of these projects I came to the conclusion that it is not really restructure that we would use here, but instead we would actually be using fontkit, which has restructure as a dependency. But then, fontkit's description is much like the opentype.js project description... Are these two projects actually trying to solve the same thing? How does fontkit differ from opentype.js ? |
that is certainly an important question to answer. |
Initially, when developing OpenType.js, I also had table specs (check out the first commit for example). Even though I'm a fan of the declarative approach, I soon discovered that a purely declarative approach is problematic. For some tables, e.g. OpenType.js and Fontkit were developed independently. I developed OpenType.js to "scratch my own itch": for NodeBox Live we needed access to the glyph shapes of a font, and nothing was available. Later on, through the efforts of @Pomax (and @louisremi), we also added font exporting. The two projects do have a lot in common and I'm playing with the idea of merging efforts. It seems Fontkit has more support for things like WOFF, which we currently don't support. I'm a bit torn at the moment. I like the things that Fontkit is doing, but I also think it would be silly to give up the effort that went into creating OpenType.js. |
@davelab6 had listed a few projects trying to solve the same thing in byte-foundry/prototypo#115 |
From a personal perspective, it's also nice to have options as a user. Right now, I can use harfbuzz or freetype2 or roll something quick and dirty using a TTX export of a font, and having that same freedom in JS land is great: I can use OpenType.js or Fontkit, and they have their respective strengths and weaknesses (format and specific shaping support, memory footprints, etc). So I'm tempted to pitch my 2 cents as "even if they do the same, the way they do things are different enough to be valueable as checks on each other". One library to rule them all always sounds tempting, but multiple libraries to divide and conquer based on the tool you need, to me, has stronger appeal. |
Some history about fontkit for those who don't know: it was originally written for PDFKit, my PDF generation library, over 4 years ago, and existed as a part of that project for a while. Then, I kept getting a ton of bug reports about font issues, much of which were related to complex script support. So, a couple years ago I started writing fontkit as it exists today, and I extracted it as a separate project so it could be useful outside of just PDFKit. It is finally getting to state where I can ship it as the default font engine in PDFKit, and I plan to do so shortly. Fontkit and opentype.js do have slightly different goals as far as I can tell. Fontkit was originally developed to solve the layout problem, along with PDF subsetting, and although it has grown to support other things like glyph decoding, layout remains its primary goal in my opinion. On the other hand, from what I can tell, opentype.js is mostly focused on encoding and decoding glyph paths, rather than glyph layout. Fontkit currently does not support glyph encoding from paths like opentype.js does, just decoding. I may need to do something in this area in the future though, in order to support variation fonts in PDFs... |
Based on the discussion seen below we currently have 2 choices:
Manually parsing the complex datastructures of the OpenType file format is very time-consuming and error prone. And also results in verbose-code with lots of code duplication and which is tipically hard to read, understand and debug.
The approach presented by @Pomax in his "A binary parser generator" project available at https://github.com/Pomax/A-binary-parser-generator seems very good because it relies on storing the overall file format structure in a "spec" file which is then used to automactically validate and parse OTF files (the project allows any file format to be specified, but OTF is what we're interested here, right :-D).
Several years ago I did something similar for handling the DWG file format in the GNU LibreDWG project (https://www.gnu.org/software/libredwg/). And based on our spec file we were able to generate both the format parser and the format encoder. I am not sure Pomax's implementation provides encoding as well, which would be strictly necessary for us here.
PS: As I see that Pomax has already contributed code to OpenType.js, I wonder if the proposal of incorporating the binary-parser-generator was already presented here in the past.
The text was updated successfully, but these errors were encountered: