Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

Fixed PDF::Reader::MalformedPDFError #986

Merged
merged 4 commits into from
Jul 4, 2022
Merged

Fixed PDF::Reader::MalformedPDFError #986

merged 4 commits into from
Jul 4, 2022

Conversation

sue445
Copy link
Owner

@sue445 sue445 commented Jul 4, 2022

  1) PdfCrawlWorker#perform is expected not to raise Exception
     Failure/Error: it { expect { subject }.not_to raise_error }
     
       expected no Exception, got #<PDF::Reader::MalformedPDFError: CidWidths: 17 must be less than 17> with backtrace:
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/cid_widths.rb:55:in `parse_second_form'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/cid_widths.rb:37:in `parse_array'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/cid_widths.rb:22:in `initialize'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/width_calculator/composite.rb:17:in `new'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/width_calculator/composite.rb:17:in `initialize'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:146:in `new'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:146:in `build_width_calculator'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:49:in `initialize'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:214:in `new'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:214:in `block in extract_descendants'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:213:in `map'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:213:in `extract_descendants'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/font.rb:48:in `initialize'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:393:in `new'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:393:in `block in build_fonts'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:392:in `each'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:392:in `map'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:392:in `build_fonts'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_state.rb:30:in `initialize'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_text_receiver.rb:44:in `new'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page_text_receiver.rb:44:in `page='
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/validating_receiver.rb:258:in `call_wrapped'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/validating_receiver.rb:24:in `page='
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:268:in `block in callback'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:267:in `each'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:267:in `callback'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:158:in `walk'
         # ./vendor/bundle/ruby/3.0.0/gems/pdf-reader-2.10.0/lib/pdf/reader/page.rb:115:in `text'
         # ./lib/workers/pdf_crawl_worker.rb:64:in `block in read_pdf'
         # ./lib/workers/pdf_crawl_worker.rb:63:in `each'
         # ./lib/workers/pdf_crawl_worker.rb:63:in `read_pdf'
         # ./lib/workers/pdf_crawl_worker.rb:36:in `parse_ccc_pdf'
         # ./lib/workers/pdf_crawl_worker.rb:6:in `perform'
         # ./spec/lib/workers/pdf_crawl_worker_spec.rb:5:in `block (3 levels) in <top (required)>'
         # ./spec/lib/workers/pdf_crawl_worker_spec.rb:12:in `block (4 levels) in <top (required)>'
         # ./spec/lib/workers/pdf_crawl_worker_spec.rb:12:in `block (3 levels) in <top (required)>'
     # ./spec/lib/workers/pdf_crawl_worker_spec.rb:12:in `block (3 levels) in <top (required)>'

ref.

@sue445 sue445 merged commit 6f5736d into master Jul 4, 2022
@sue445 sue445 deleted the fix_crawler branch July 4, 2022 13:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant