Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show Glyphs in ALTO 4 #7

Closed
bertsky opened this issue Dec 11, 2019 · 7 comments
Closed

show Glyphs in ALTO 4 #7

bertsky opened this issue Dec 11, 2019 · 7 comments

Comments

@bertsky
Copy link

bertsky commented Dec 11, 2019

It's great that PageViewer already supports ALTO v4. But it seems that Glyph elements are not displayed yet (as they are for PAGE). Is it planned to add that anytime soon?

(I would like to help, but I cannot even find where ALTO gets imported. Is this actually in prima-core-libs or prima-page-converter?)

@chris1010010
Copy link
Contributor

I'll have a look when I have time.
It's in core libs PrimaDla org.primaresearch.dla.page.io.xml.sax.SaxPageHandler_Alto_2_1
(it's ALTO 2.1 upwards)

@bertsky
Copy link
Author

bertsky commented Dec 12, 2019

Thanks!

SaxPageHandler_Alto_2_1 looks very promising, I'd like to try extending it, but I have trouble getting all the PRImA projects to build in the first place. I even got to manually import the various libraries and repos into Eclipse (as existing projects, sometimes removing fixed paths like for GWT, or as new Java projects where no .project was present). But alas, they give me tons of error messages when I try to build. Without instructions or documentation, this is just too much effort for me.

@chris1010010
Copy link
Contributor

Sorry for that, I thought building would be easier. I'll remove the GWT stuff anyway soon I think. Hope that will improve things

@chris1010010
Copy link
Contributor

I made an update, have a look if it works for you (I don't have proper examples for ALTO with glyphs)

@bertsky
Copy link
Author

bertsky commented Dec 17, 2019

It works – perfectly! Thanks!

(I don't have proper examples for ALTO with glyphs)

Above mentioned PR will add that functionality to Tesseract. (It's currently tesseract -l eng -c document_title=input.tif input.tif input.alto alto to arrive at a input.alto.xml file.)

@mrocr
Copy link

mrocr commented Dec 17, 2019

@chris1010010 why you didn't merge the update at master?
I see it in release

@bertsky
Copy link
Author

bertsky commented Dec 17, 2019

@mrocr It's a change in external library code only! PrimaDla

@bertsky bertsky closed this as completed Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants