Skip to content

Commit

Permalink
Update text masking for PDF attachments
Browse files Browse the repository at this point in the history
Large PDFs attachments, with lots of images are sometimes received.
These will normally will be compressed and expand to much larger files
size. This can cause memory issues due to how we are passing the data
around the application to external commands such as `pdftk`.

Passing this data via STDIN to `pdftk` to seems to be the main issue
here where we are seeing uncompressed PDFs over 900Mb not able to be
re-compressed despite leaving the process running for over 1 hour.

This change switches to use tempfiles when uncompressing or compressing
PDF attachments which, in testing on my development and WDTK production
environment, allows the whole `apply_masks` process which includes
uncompressing and re-compressing, to complete in under 2 minutes.
  • Loading branch information
gbp committed Jan 12, 2023
1 parent d907f71 commit fd7fc77
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 4 deletions.
1 change: 1 addition & 0 deletions doc/CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Highlighted Features

* Improve processing of large PDF attachments (Graeme Porteous)
* Add support for Ruby 3 (Graeme Porteous)
* Drop support for Ruby 2.7 (Graeme Porteous)

Expand Down
24 changes: 20 additions & 4 deletions lib/alaveteli_text_masker.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
require 'tempfile'

module AlaveteliTextMasker
include ConfigHelper

Expand Down Expand Up @@ -47,10 +49,22 @@ def apply_masks(text, content_type, options = {})
private

def uncompress_pdf(text)
AlaveteliExternalCommand.run("pdftk", "-", "output", "-", "uncompress", :stdin_string => text)
temp = Tempfile.new('pdftk', './tmp', encoding: 'ascii-8bit')
temp.write(text)
temp.close

AlaveteliExternalCommand.run(
"pdftk", temp.path, "output", "-", "uncompress"
)
ensure
temp.unlink
end

def compress_pdf(text)
temp = Tempfile.new('pdftk', './tmp', encoding: 'ascii-8bit')
temp.write(text)
temp.close

if AlaveteliConfiguration::use_ghostscript_compression
command = ["gs",
"-sDEVICE=pdfwrite",
Expand All @@ -60,11 +74,13 @@ def compress_pdf(text)
"-dQUIET",
"-dBATCH",
"-sOutputFile=-",
"-"]
temp.path]
else
command = ["pdftk", "-", "output", "-", "compress"]
command = ["pdftk", temp.path, "output", "-", "compress"]
end
AlaveteliExternalCommand.run(*(command + [ :stdin_string => text ]))
AlaveteliExternalCommand.run(*command)
ensure
temp.unlink
end

def apply_pdf_masks(text, options = {})
Expand Down

0 comments on commit fd7fc77

Please sign in to comment.