Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pdfkit instead of canvas #496

Merged
merged 2 commits into from
Mar 13, 2016
Merged

Use pdfkit instead of canvas #496

merged 2 commits into from
Mar 13, 2016

Conversation

ForbesLindesay
Copy link
Contributor

This removes the dependency on the native "canvas" module and instead uses pdfkit. I did this because I was unable to get canvas installed on my machine. I'm not sure if this will perform better or worse, but it seems like something that may be worth benchmarking.

@espadrine
Copy link
Member

I definitely know the struggle of getting it to work! It is one of the secret reasons to switch away from Heroku (although installing libcairo has been decently easy on raw Linux so far).

It's great to have an experiment to play with, thanks a lot!

$ node time-pdfkit
avg: 36682.7770700637
median: 18485
$ node time-canvas
avg: 91703.15923566879
median: 77127

Not sure how that is possible. Will make more tests.

Of course, we'll need to have comparisons of actual badge images, with extremes (little text, huge chunk of text, unicode…)

@espadrine
Copy link
Member

I also had this idea to avoid the problem of sending literal text in badges which will always be seen wrong with people that don't have the right font, by converting it to an SVG path, which I had implemented in node-canvas. But I never got around to doing the last step in shields, so I guess it isn't that valuable at this point.

@ForbesLindesay
Copy link
Contributor Author

That looks like a massive performance impact! Am I reading those numbers correctly?

I would think converting them to paths would make the file size a fair bit larger for very little real gain. Most browsers have a sufficient set of fonts and the fallbacks won't usually look too bad. I also would think that the literal text could be better handled by screen readers (although I don't know if any screen readers actually do that)

@espadrine
Copy link
Member

That looks like a massive performance impact

It's in nanoseconds. An average of 100ms isn't the bottleneck anyway, but I expected the C implementation to be faster than the JS implementation.

Unfortunately, it doesn't fix the bug we have with th, as in: . However, it does seem to deal with unicode well. (Edit: maybe it is a matter of fallback font; does pdfkit have the concept?)

It isn't as accurate for long strings of text, at least on Linux:

PDFKIT NODE-CANVAS
badge-pdfkit badge-current

@ForbesLindesay
Copy link
Contributor Author

I'm not surprised, well optimised JavaScript modules are usually faster than their C counterparts when they are called from JavaScript unless they are very long running because of the large amount of time spent crossing the boundary between languages. (for example, the JavaScript MySQL driver consistently outperforms C bindings).

I could implement this as a fail over (i.e. use canvas if available, otherwise fall back to pdfkit) if the accuracy on long strings is too much of an issue? Another alternative would be to just allow slightly larger tolerances for very long strings (which risks having excessive margins but would prevent things being cut of)

For th we could special case it in the measure function and just add an adjustment factor? With measureTextWidth factored out into a separate module that should be relatively easy to do.

@espadrine
Copy link
Member

I could implement this as a fail over (i.e. use canvas if available, otherwise fall back to pdfkit) if the accuracy on long strings is too much of an issue?

Right now, my feeling is that long strings are uncommon enough that the benefit of removing the cairo dependency outweighs the issue. But it feels like an IEEE754 bug that might be fixable in pdfkit. Otherwise, a hack for long strings (such as adding a few pixels to the result of the computation) might work.

For th we could special case it in the measure function and just add an adjustment factor?

Given that it isn't a regression, it isn't necessary for this pull request. I wonder if I'm right about fallback fonts though.

@ForbesLindesay
Copy link
Contributor Author

Not sure about fallback fonts in PDFkit. It errors if the ttf doesn't exist, and if you were actually using it to generate a PDF, it would embed the font in the pdf.

I'll add support for falling back to a default font if the ttf file doesn't exist so that we can run tests on travis again.

@espadrine
Copy link
Member

It errors if the ttf doesn't exist

Right, but what happens if the glyph isn't defined in the font?

@ForbesLindesay
Copy link
Contributor Author

I'm not sure, that may be an issue

@espadrine
Copy link
Member

While reading this interesting implementation of OTF-based font measurement, I realized that we may have been mistaken. I think that perhaps PDFKit's widthOfString does not return pixels, but, as is extensively used in the documentation, points. Fortunately, those units are proportional to each other, which means that we merely have to decide of a factor to multiply the result of widthOfString to, and get pixels out. In browsers on screen, it seems hardcoded by CSS3 to 4/3 (~1.33).

Trying to figure out the exact unit used for widthOfString is a bit complex: text calls font calls widthOfGlyph, so it should be of unit widthOfGlyph * fontSize / 1000 and widthOfGlyph is hmtx * 1000 / unitsPerEm. Is it hmtx * point / em * em / FUnit? point is a fixed distance, em is the width and height of the master grid, so fontSize = point / em. unitsPerEm are set from the head table which are documented by Apple as:

This value should be a power of 2. Its range is from 64 through 16384.

Chapter 1 unveils that it reflects the density of the master grid on which curve points are set. The more FUnits there are in an em, the more precise typographs can be when placing curve points.

hmtx is defined in Apple's documentation as a longHorMetric, which is unhelpful, as it isn't defined anywhere. However, Microsoft's documentation indicates that it holds font design units (FUnits).

As a result, after cancellation of units in the original formula, I believe widthOfString is of unit point.

@espadrine
Copy link
Member

I am not deeply knowledgeable about Microsoft fonts, but I accidentally noticed today that the md5 of my local Verdana font, from the Ubuntu package (3ba52ab1fa0cd726e7868e9c6673902c, 137K) is distinct from that of my Mac (4ba636f1f53078cdd483de8f7630600f, 182K). PDFkit seems to have been built with a Mac. None of the issues I reported are there with the Mac version of Verdana.ttf (well, except for the ᵗʰ thing), which I copied to my Linux machine. I think we can move forward with this patch, assuming I keep using the Mac version going forward, which awkwardly I cannot provide publicly for licensing reasons.

Maybe Linux' Verdana was always broken and Cairo (or is it Pango?) used tricks around that. Maybe Mac's Verdana includes hinting information that PDFkit uses and that is absent from Linux' version, which assumes you read data from a field which PDFkit doesn't currently support. We may never know.

@espadrine
Copy link
Member

As an aside to get back to in the future, I'd like to note another possibility, which we might consider if font choice becomes necessary. This library makes use of opentype.js, which allows using font information to convert text to an SVG path, as seen here. As it is, it produces a bloated image, with huge paths even for simple letters. Glyphs could be optimized by omitting needless points and caching the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants