-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the display
side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
#8064
Conversation
Note that based on this patch, it would also be quite easy to add a simple cache to avoid having to load the same CMap files over and over, i.e. it could help with addressing issue #4794. Although, I'm not sure if the intent of that issue was that we should somehow cache the parsed CMaps, or if it'd suffice if the raw CMap files are cached. Anyway, I've pushed an additional commit which at least reduces the number of file loads considerably for certain PDF files, without the need for any larger refactoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with the stringToBytes change.
src/display/dom_utils.js
Outdated
if (this.binary && request.response) { | ||
data = new Uint8Array(request.response); | ||
} else if (!this.binary && request.responseText) { | ||
var arr = Array.prototype.map.call(request.responseText, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use stringToBytes
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed now.
I don't think anything changes here, but we also need to be sure that permission restrictions stay the same when worker is loaded from foreign origin. |
For future, we can opened an issue to create a separate factory and set of js-files for each map to be loaded cross-origin (e.g. these js files will have binary encoded array which will be decoded when loaded via script tag, the files can be generated during pdfjs-dist build and use any PDF compression, e.g. have it as ASCII85 and DEFLATE'd; maybe we shall change bool 'isBinary' field to enum) |
Good idea, I've attempted to implement this; an interdiff is available at https://gist.github.com/Snuffleupagus/46262b3cd0d63c1e7d52ac97d19ffad3. |
…on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js) Currently the built-in CMap files are loaded in `src/core/cmap.js` using `XMLHttpRequest` directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread. This is inspired by other recent work, e.g. the addition of the `CanvasFactory`, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/9972b61cc0c5ce9/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/ffa390df14e6d1a/output.txt |
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/ffa390df14e6d1a/output.txt Total script time: 21.47 mins
|
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/9972b61cc0c5ce9/output.txt Total script time: 25.50 mins
|
@yurydelendik Even though you already approved this PR, would you mind quickly looking over it again before we land it? |
lgtm |
Thank you for the patch. |
Nice work! This has made a massive difference for me with loading |
[api-minor] Refactor fetching of built-in CMaps to utilize a factory on the `display` side instead, to allow users of the API to provide a custom CMap loading factory (e.g. for use with Node.js)
When the binary CMap format had been added there were also some ideas about *maybe* providing formats, see mozilla#8064 (comment), however that's a decade ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
After the binary CMap format had been added there were also some ideas about *maybe* providing formats, see mozilla#8064 (comment), however that was over seven years ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
After the binary CMap format had been added there were also some ideas about *maybe* providing formats, see [here](mozilla#8064 (comment)), however that was over seven years ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](mozilla#8064 (comment)), however that was over seven years ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
After the binary CMap format had been added there were also some ideas about *maybe* providing other formats, see [here](mozilla#8064 (comment)), however that was over seven years ago and we still only use binary CMaps. Hence it now seems reasonable to simplify the relevant code by removing `CMapCompressionType` and instead just use a boolean to indicate the type of the built-in CMaps.
Currently the built-in CMap files are loaded in
src/core/cmap.js
usingXMLHttpRequest
directly. For some environments that might be a problem, hence this patch refactors that to instead use a factory to load built-in CMaps on the main thread and message the data to the worker thread.This is inspired by other recent work, e.g. the addition of the
CanvasFactory
, and to a large extent on the IRC discussion starting at http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53010.Please note: While it certainly may be possible to improve the patch, it does work just fine locally as-is, i.e. the viewer still works and all unit/font/reference tests pass.
@yurydelendik Since this PR attempts to implement your idea from http://logs.glob.uno/?c=mozilla%23pdfjs&s=12+Oct+2016&e=12+Oct+2016#c53018, I'd appreciate if you could provide feedback on the implementation when you've got time.
Edit: Based on this PR, we should e.g. be able to get the CMap unit-tests running on Travis, by providing a custom Node.js factory for the unit-test.
Edit 2: Also fixes #4794, courtesy of the second commit.
This change is