You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyPDF loader is a great tool. However, I am having a problem extracting the chemical formulas with subscripts. For example
H2O is extracted as \nHO \2. Is there any way to fix this issue? (Maybe with visitor functions) Thanks in advance.
Best regards
The text was updated successfully, but these errors were encountered:
PDF is a electronic printing format where "glyphs" are printed at defined positions with defined size. subscripts are not always printed in the good order. You can try with visitors, but it may be tricky.
PDF is a electronic printing format where "glyphs" are printed at defined positions with defined size. subscripts are not always printed in the good order. You can try with visitors, but it may be tricky.
Thanks for the advice. Do you have any idea what should be changed, which parameters should be tricked?
I'm closing this issue, because I don't know how we could tackle this. It's a good question and a desirable result, but I don't see this happening in the next years. I've added it to #1181 just in case somebody has an idea.
PyPDF loader is a great tool. However, I am having a problem extracting the chemical formulas with subscripts. For example
H2O is extracted as \nHO \2. Is there any way to fix this issue? (Maybe with visitor functions) Thanks in advance.
Best regards
The text was updated successfully, but these errors were encountered: