Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text layer rendering problem in Hanifi Rohingya text #1276

Open
bgo-eiu opened this issue Nov 13, 2023 · 2 comments
Open

Text layer rendering problem in Hanifi Rohingya text #1276

bgo-eiu opened this issue Nov 13, 2023 · 2 comments

Comments

@bgo-eiu
Copy link

bgo-eiu commented Nov 13, 2023

See: https://archive.org/details/20231022_20231022_2050/page/n3/mode/2up

The text in the right hand side columns mostly does not render, but does come through here and there. This could be related to the characters used—it is Rohingya in the Hanifi script, a right to left writing system. However the characters do appear in places and this text may be included as images, I am unsure.

@cdrini
Copy link
Contributor

cdrini commented Nov 13, 2023

Hi @bgo-eiu , thank you for the report!

One of our engineers notes:

this kind of problem is often due to use in the PDF of an uncommon font that isn’t present on the system where we do the conversion, and can be avoided by including the font in the PDF when it’s built.

It looks like you are the uploader of this document ; are you by any chance able to include the font into the PDF and re-upload it?

@hbromley
Copy link

Hi, @bgo-eiu, I'm the engineer who was quoted above. Here's a font analysis of the PDF:

$ pdffonts E-N-G-L-I-S-H-T-O-R-O-H-I-N-G-Y-A-D-I-C-T-I-O-N-A-R-Y-N-compressed.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
AYUGFB+Calibri                       TrueType          WinAnsi          yes yes no   13075  0
Rohingya_{n}_muzhari                 TrueType          WinAnsi          no  no  no   13079  0
ABGHHM+Calibri,Bold                  TrueType          WinAnsi          yes yes no   13082  0
AAAAAJ+Arial,Bold                    CID TrueType      Identity-H       yes yes yes  13086  0
Arial,Bold                           TrueType          WinAnsi          no  no  no   13094  0
AAAAAJ+Calibri,Italic                TrueType          WinAnsi          yes yes no   13097  0
AURWPJ+Arial                         CID TrueType      Identity-H       yes yes yes  12557  0
AFMHMR+Traditional Arabic            CID TrueType      Identity-H       yes yes yes      2  0
AAAAAJ+Traditional Arabic            TrueType          WinAnsi          yes yes no       9  0
AFMHMR+Traditional Arabic            CID TrueType      Identity-H       yes yes yes     12  0
AAAAAJ+Amiri                         CID TrueType      Identity-H       yes yes yes     17  0
AAAAAJ+Amiri                         TrueType          WinAnsi          yes yes no      23  0
A-Rohingya_{n}_muzhari               TrueType          WinAnsi          no  no  no   12551  0
Arial                                TrueType          WinAnsi          no  no  no   12554  0
AHTAUU+Calibri                       CID TrueType      Identity-H       yes yes yes  12833  0
Arial                                TrueType          WinAnsi          no  no  no    8106  0
AURWPJ+Arial                         CID TrueType      Identity-H       yes yes yes  12968  0

Note the rows that have "no" in the "emb" column, indicating that those fonts are not embedded in the PDF, and also a "no" in the "uni" column, indicating that the PDF also has no mapping for those characters into Unicode, which would enable us to render the character even if we don't have the non-embedded font installed locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants