Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Can't read text in parsed PDFs (weird encoding) #57

Open
AndreaCogliati opened this issue Jun 18, 2017 · 0 comments
Open

Can't read text in parsed PDFs (weird encoding) #57

AndreaCogliati opened this issue Jun 18, 2017 · 0 comments

Comments

@AndreaCogliati
Copy link

I'm using ILPDFKit to extract some text from some PDF files generated by another iOS app. Now I'm having some issues with the encoding of the text in certain files. I convert the Contents stream of an ILPDFPage into a string, then look for BT / ET pairs to extract the text.

For instance, one file contains the following text stream:

BT 0.03260000 Tc 7 0 0 7 0 0 Tm /Tc1 1 Tf [ (Las) 4 (t Name) ] TJ ET

from which I can easily extract the string Last Name

In another file (which has the same general format of the previous file, and which renders correctly on screen), I see the following string instead:

BT 0.03260000 Tc 7 0 0 7 0 0 Tm /TT2 1 Tf [ (!\"#) 4 ($%&\"\'\\() ] TJ ET

Why do I see those weird characters instead of the text Last Name? What am I doing wrong?

The only difference between the two files, apparently, is that one was created on iOS 9, the other was created on iOS 10.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant