Extra spaces between letters in a single word #509

pickhardt · 2023-03-27T00:50:06Z

I noticed this gem has problems parsing some PDFs where the text is not necessarily clean.

For instance, this file: https://www.jstor.org/stable/3684663

Some parts of it get output like: "a b o u t a r e g r e s s i o n t o o r i g i n a l c h a o s"

However, it doesn't seem like it's inherently a problem with the file, because Python's PyPDF2 interprets it correctly as "about a regression to original chaos"

Do you think there is some step that this reader is missing? Or alternatively is there some option I should set when using the PDF::Reader to get it to read the pdfs better?

shmolf · 2024-04-15T19:00:23Z

I too am experiencing this issue.

iprog21 · 2024-05-31T08:36:30Z

same here.

I did some gsub. it works when the clustered word is in Pascal Case.

TheFirstWord = The First Word gsub(/([a-z])([A-Z])/, '\1 \2')
thefirstword = thefirstword ???

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra spaces between letters in a single word #509

Extra spaces between letters in a single word #509

pickhardt commented Mar 27, 2023

shmolf commented Apr 15, 2024

iprog21 commented May 31, 2024

Extra spaces between letters in a single word #509

Extra spaces between letters in a single word #509

Comments

pickhardt commented Mar 27, 2023

shmolf commented Apr 15, 2024

iprog21 commented May 31, 2024