Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFDocument.saveToBuffer with options not working #100

Closed
mfts opened this issue Jul 6, 2024 · 5 comments
Closed

PDFDocument.saveToBuffer with options not working #100

mfts opened this issue Jul 6, 2024 · 5 comments

Comments

@mfts
Copy link

mfts commented Jul 6, 2024

Hi there,

I want to add mutool convert options to a PDFDocument buffer to compress the pdf before converting it to images.

I found in the docs that I can pass these options to PDFDocument.saveToBuffer

However, the example in the docs doesn't match with the actual implementation.
Docs Example:

var buffer = pdfDocument.saveToBuffer({"compress-images":true});

However, the implementation shows that options should be a string not an object: https://github.com/ArtifexSoftware/mupdf.js/blob/68a506ad218c6f439169ebd07ddc33e150c22601/src/mupdf.ts#L2281C2-L2285C2

Ok so I pass it as a string, however, I'm not sure what format it should be, so I did a little further digging into libmupdf and found the parsing function for buffer write options: https://github.com/ArtifexSoftware/mupdf/blob/53aae51af4eea14fabde144948ba61f7537053f9/source/pdf/pdf-write.c#L3264

Which one is the correct way to pass options to saveToBuffer in mupdf-wasm?:

  1. var buffer = pdfDocument.saveToBuffer({"compress-images":true, "linearize":true, "garbage":true});
  2. var buffer = pdfDocument.saveToBuffer("compress-images:true,linearize:true,garbage:true");
  3. var buffer = pdfDocument.saveToBuffer("compress-images=yes,linearize=yes,garbage=yes");
  4. var buffer = pdfDocument.saveToBuffer("compress-images,linearize,garbage");

Only 2. variation works but I have doubts if the options are parsed properly

Every other variation throws this error

Error: Error: cannot seek in buffer: No error information 

 ⨯ Error: cannot seek in buffer: No error information
    at 5737853 (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:995:27)
    at runEmAsmFunction (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:4272:30)
    at _emscripten_asm_const_int (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:4275:14)
    at wasm://wasm/023c3496:wasm-function[112]:0x82aa
    at invoke_vi (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:5715:29)
    at wasm://wasm/023c3496:wasm-function[425]:0x50bec
    at Object._wasm_pdf_write_document_buffer (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:705:12)
    at PDFDocument.saveToBuffer (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf.js:1663:36)
    at __WEBPACK_DEFAULT_EXPORT__ (webpack-internal:///(api)/./pages/api/mupdf/convert-page.ts:59:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async K (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/compiled/next-server/pages-api.runtime.dev.js:21:2871)
    at async U.render (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/compiled/next-server/pages-api.runtime.dev.js:21:3955)
    at async DevServer.runApi (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/server/next-server.js:600:9)
    at async NextNodeServer.handleCatchallRenderRequest (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/server/next-server.js:269:37)
    at async DevServer.handleRequestImpl (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/server/base-server.js:816:17) {
  page: '/api/mupdf/convert-page'
}

Would really appreciate your help / some pointers.

@mfts
Copy link
Author

mfts commented Jul 6, 2024

As a follow up, would it be possible to compress the pdf slide in a different process before converting to an image?

Either in one of the following steps

  • loadPage from document
    var page = document.loadPage(0) step
  • toPixmap
    var pixmap = page.toPixmap(mupdf.Matrix.identity, mupdf.ColorSpace.DeviceRGB, true, true);
  • asPNG
    var png = pixmap.asPNG()

@ccxvii
Copy link
Collaborator

ccxvii commented Jul 8, 2024

First off I wonder why you're trying to "compress the pdf before converting it to images", and what you hope to accomplish with that?

Saving the PDF is intended to save a PDF document (with any edits) to file. The various save options only affect the byte representation of the PDF on disk. Opening and instantly saving a PDF file will change the bytes on on disk, but should result in an identical appearance and behavior when opening the new file.

For example the "compress-images" option will make sure that uncompressed images are compressed with Flate. It will not change or re-compressed any JPEG images if that is what you're trying to do!

Likewise, the "garbage" option will remove unused objects from the file. If you're just saving out a file without having done any edits, there is nothing to garbage collect.

--

The options is a comma separated string of properties (converting an object into this string is a TODO item, so future versions may support option 1 but for now you'll need to use option 3 or 4).

If a property value is present but not set to a value, it is implied to be "yes".

Only variations 3 and 4 should work.

var buffer = pdfDocument.saveToBuffer("compress-images,garbage");

The error you get is because you try to save with linearization to a Buffer. The linearization code is not very well tested, and only works on "seekable" outputs (which Buffer is not).

In any case I would I would recommend NOT using the linearize option, as its only real effect is in making the resulting PDF bigger and slower.

@mfts
Copy link
Author

mfts commented Jul 10, 2024

Thanks for your feedback.

The issue is that when I'm converting a image-heavy page to an image the mupdf-wasm package doesn't always work. However, the reason was that I scale the mupdf.Matrix.scale resolution to 3x. On 1x the images are just too blurry. For image/color-heavy pages this will take a really long time.

So compressing the PDF before converting it with page.toPixmap() was the goal.

My temp fix is to just increase the matrix to 2x depending on the existing resolution of the pdf page. It's an ok workaround.

@ccxvii
Copy link
Collaborator

ccxvii commented Jul 10, 2024

"Compressing" the PDF before converting won't necessarily help. What exact error message did you see when converting the image-heavy page? Did it run out of memory?

@mfts
Copy link
Author

mfts commented Jul 11, 2024

Yes it ran out of memory at 3x matrix scale but worked well at 2x.

@ccxvii ccxvii closed this as not planned Won't fix, can't repro, duplicate, stale Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants