Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hocr-pdf : Possible calculation issue #118

Open
whikloj opened this issue Nov 3, 2017 · 4 comments
Open

hocr-pdf : Possible calculation issue #118

whikloj opened this issue Nov 3, 2017 · 4 comments

Comments

@whikloj
Copy link

whikloj commented Nov 3, 2017

I could be wrong, but in reading this calculation which you use for adjusting the height of text it seems like box[0] is left and box[2] is right from the bbox coordinates. Additionally, the linebox[0] would also be left.

I changed it to this based on my reading of the hOCR spec for bbox

But in case I misunderstood your intention, I thought I'd open this issue.

@zuphilip
Copy link
Collaborator

zuphilip commented Nov 4, 2017

What will this change in the outputted PDF? CC @jbreiden @kba

@whikloj
Copy link
Author

whikloj commented Nov 6, 2017

So if I am understanding correctly.

Your calculation is

b = polyval(baseline, (box[0] + box[2]) / 2 - linebox[0]) + linebox[3]

which based on the HOCR spec is

b = polyval(baseline, (bbox-left + bbox-right) / 2 - linebox-left) + linebox-bottom

which you then use here

text.setTextOrigin(box[0] * 72 / dpi, height - b * 72 / dpi)

So this calculation is using Left and Right to calculate an average then subtracting that from height. It seems to me the you'd want to get the average of Top and Bottom (ie. box[1] and box[3]).

So to my mind this makes more sense

b = polyval(baseline, (box[1] + box[3]) / 2 - linebox[1]) + linebox[3]

or

b = polyval(baseline, (bbox-top + bbox-bottom) / 2 - linebox-top) + linebox-bottom

The difference would be seem more obviously in longer words where the difference between Left and Right would be larger. But (as I said before) perhaps I am not understanding what you are trying to accomplish with this calculation.

@jbreiden
Copy link
Contributor

jbreiden commented Nov 6, 2017

I don't actively use this program any more, so have not been paying attention. Test with descenders like 'yyy', without descenders like 'xxx' and mixed like 'xxxyyy'.

@nsshah14
Copy link

@zuphilip @jbreiden Can you elaborate on this calculation. I am not getting the proper selection for my text.

Screen Shot 2022-06-28 at 3 31 07 PM

Here the selection goes to the bottom of the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants