-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hocr-pdf : Possible calculation issue #118
Comments
So if I am understanding correctly. Your calculation is
which based on the HOCR spec is
which you then use here
So this calculation is using Left and Right to calculate an average then subtracting that from height. It seems to me the you'd want to get the average of Top and Bottom (ie. box[1] and box[3]). So to my mind this makes more sense
or
The difference would be seem more obviously in longer words where the difference between Left and Right would be larger. But (as I said before) perhaps I am not understanding what you are trying to accomplish with this calculation. |
I don't actively use this program any more, so have not been paying attention. Test with descenders like 'yyy', without descenders like 'xxx' and mixed like 'xxxyyy'. |
I could be wrong, but in reading this calculation which you use for adjusting the height of text it seems like
box[0]
is left andbox[2]
is right from the bbox coordinates. Additionally, thelinebox[0]
would also be left.I changed it to this based on my reading of the hOCR spec for bbox
But in case I misunderstood your intention, I thought I'd open this issue.
The text was updated successfully, but these errors were encountered: