Can't read text in parsed PDFs (weird encoding) #57

AndreaCogliati · 2017-06-18T22:48:58Z

I'm using ILPDFKit to extract some text from some PDF files generated by another iOS app. Now I'm having some issues with the encoding of the text in certain files. I convert the Contents stream of an ILPDFPage into a string, then look for BT / ET pairs to extract the text.

For instance, one file contains the following text stream:

BT 0.03260000 Tc 7 0 0 7 0 0 Tm /Tc1 1 Tf [ (Las) 4 (t Name) ] TJ ET

from which I can easily extract the string Last Name

In another file (which has the same general format of the previous file, and which renders correctly on screen), I see the following string instead:

BT 0.03260000 Tc 7 0 0 7 0 0 Tm /TT2 1 Tf [ (!\"#) 4 ($%&\"\'\\() ] TJ ET

Why do I see those weird characters instead of the text Last Name? What am I doing wrong?

The only difference between the two files, apparently, is that one was created on iOS 9, the other was created on iOS 10.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't read text in parsed PDFs (weird encoding) #57

Can't read text in parsed PDFs (weird encoding) #57

AndreaCogliati commented Jun 18, 2017

Can't read text in parsed PDFs (weird encoding) #57

Can't read text in parsed PDFs (weird encoding) #57

Comments

AndreaCogliati commented Jun 18, 2017