PDFDocument.saveToBuffer with options not working #100

mfts · 2024-07-06T11:16:30Z

Hi there,

I want to add mutool convert options to a PDFDocument buffer to compress the pdf before converting it to images.

I found in the docs that I can pass these options to PDFDocument.saveToBuffer

However, the example in the docs doesn't match with the actual implementation.
Docs Example:

var buffer = pdfDocument.saveToBuffer({"compress-images":true});

However, the implementation shows that options should be a string not an object: https://github.com/ArtifexSoftware/mupdf.js/blob/68a506ad218c6f439169ebd07ddc33e150c22601/src/mupdf.ts#L2281C2-L2285C2

Ok so I pass it as a string, however, I'm not sure what format it should be, so I did a little further digging into libmupdf and found the parsing function for buffer write options: https://github.com/ArtifexSoftware/mupdf/blob/53aae51af4eea14fabde144948ba61f7537053f9/source/pdf/pdf-write.c#L3264

Which one is the correct way to pass options to saveToBuffer in mupdf-wasm?:

var buffer = pdfDocument.saveToBuffer({"compress-images":true, "linearize":true, "garbage":true});
var buffer = pdfDocument.saveToBuffer("compress-images:true,linearize:true,garbage:true");
var buffer = pdfDocument.saveToBuffer("compress-images=yes,linearize=yes,garbage=yes");
var buffer = pdfDocument.saveToBuffer("compress-images,linearize,garbage");

Only 2. variation works but I have doubts if the options are parsed properly

Every other variation throws this error

Error: Error: cannot seek in buffer: No error information 

 ⨯ Error: cannot seek in buffer: No error information
    at 5737853 (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:995:27)
    at runEmAsmFunction (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:4272:30)
    at _emscripten_asm_const_int (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:4275:14)
    at wasm://wasm/023c3496:wasm-function[112]:0x82aa
    at invoke_vi (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:5715:29)
    at wasm://wasm/023c3496:wasm-function[425]:0x50bec
    at Object._wasm_pdf_write_document_buffer (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf-wasm.js:705:12)
    at PDFDocument.saveToBuffer (file:///Users/mfts/dev/mfts/papermark-main/node_modules/mupdf/dist/mupdf.js:1663:36)
    at __WEBPACK_DEFAULT_EXPORT__ (webpack-internal:///(api)/./pages/api/mupdf/convert-page.ts:59:26)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async K (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/compiled/next-server/pages-api.runtime.dev.js:21:2871)
    at async U.render (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/compiled/next-server/pages-api.runtime.dev.js:21:3955)
    at async DevServer.runApi (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/server/next-server.js:600:9)
    at async NextNodeServer.handleCatchallRenderRequest (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/server/next-server.js:269:37)
    at async DevServer.handleRequestImpl (/Users/mfts/dev/mfts/papermark-main/node_modules/next/dist/server/base-server.js:816:17) {
  page: '/api/mupdf/convert-page'
}

Would really appreciate your help / some pointers.

The text was updated successfully, but these errors were encountered:

mfts · 2024-07-06T11:22:30Z

As a follow up, would it be possible to compress the pdf slide in a different process before converting to an image?

Either in one of the following steps

loadPage from document
var page = document.loadPage(0) step
toPixmap
var pixmap = page.toPixmap(mupdf.Matrix.identity, mupdf.ColorSpace.DeviceRGB, true, true);
asPNG
var png = pixmap.asPNG()

ccxvii · 2024-07-08T10:22:58Z

First off I wonder why you're trying to "compress the pdf before converting it to images", and what you hope to accomplish with that?

Saving the PDF is intended to save a PDF document (with any edits) to file. The various save options only affect the byte representation of the PDF on disk. Opening and instantly saving a PDF file will change the bytes on on disk, but should result in an identical appearance and behavior when opening the new file.

For example the "compress-images" option will make sure that uncompressed images are compressed with Flate. It will not change or re-compressed any JPEG images if that is what you're trying to do!

Likewise, the "garbage" option will remove unused objects from the file. If you're just saving out a file without having done any edits, there is nothing to garbage collect.

--

The options is a comma separated string of properties (converting an object into this string is a TODO item, so future versions may support option 1 but for now you'll need to use option 3 or 4).

If a property value is present but not set to a value, it is implied to be "yes".

Only variations 3 and 4 should work.

var buffer = pdfDocument.saveToBuffer("compress-images,garbage");

The error you get is because you try to save with linearization to a Buffer. The linearization code is not very well tested, and only works on "seekable" outputs (which Buffer is not).

In any case I would I would recommend NOT using the linearize option, as its only real effect is in making the resulting PDF bigger and slower.

mfts · 2024-07-10T11:36:33Z

Thanks for your feedback.

The issue is that when I'm converting a image-heavy page to an image the mupdf-wasm package doesn't always work. However, the reason was that I scale the mupdf.Matrix.scale resolution to 3x. On 1x the images are just too blurry. For image/color-heavy pages this will take a really long time.

So compressing the PDF before converting it with page.toPixmap() was the goal.

My temp fix is to just increase the matrix to 2x depending on the existing resolution of the pdf page. It's an ok workaround.

ccxvii · 2024-07-10T16:15:21Z

"Compressing" the PDF before converting won't necessarily help. What exact error message did you see when converting the image-heavy page? Did it run out of memory?

mfts · 2024-07-11T07:46:12Z

Yes it ran out of memory at 3x matrix scale but worked well at 2x.

ccxvii closed this as not planned Won't fix, can't repro, duplicate, stale Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDFDocument.saveToBuffer with options not working #100

PDFDocument.saveToBuffer with options not working #100

mfts commented Jul 6, 2024

mfts commented Jul 6, 2024

ccxvii commented Jul 8, 2024 •

edited

Loading

mfts commented Jul 10, 2024

ccxvii commented Jul 10, 2024

mfts commented Jul 11, 2024

PDFDocument.saveToBuffer with options not working #100

PDFDocument.saveToBuffer with options not working #100

Comments

mfts commented Jul 6, 2024

mfts commented Jul 6, 2024

ccxvii commented Jul 8, 2024 • edited Loading

mfts commented Jul 10, 2024

ccxvii commented Jul 10, 2024

mfts commented Jul 11, 2024

ccxvii commented Jul 8, 2024 •

edited

Loading