Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR Word confidence in Annotations #68

Open
glenrobson opened this issue Aug 26, 2016 · 2 comments
Open

OCR Word confidence in Annotations #68

glenrobson opened this issue Aug 26, 2016 · 2 comments

Comments

@glenrobson
Copy link
Member

Description

I am a harvester of IIIF content who would like to use the OCR word confidence in my index.

Variation(s)

Proposed Solutions

Some way of adding OCR word confidence from ALTO to IIIF Annotations.

Additional Background

This use case came up for Newspapers but I believe it is more widely applicable. Example Alto:

http://dams.llgc.org.uk/behaviour/llgc-id:3100022/fedora-sdef:alto/getAlto

and IIIF annotation list:

http://dams.llgc.org.uk/iiif/3100022/annotation/list/ART1.json

I believe WC is word confidence:

<String ID="PAG_1_ST1" STYLEREFS="TXT_2" HPOS="921" VPOS="2937" HEIGHT="123" WIDTH="246" WC="0.99" CONTENT="Just"/>
@jronallo
Copy link
Collaborator

hOCR also provides word confidence in the x_wconf value.

https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview

@cneud
Copy link

cneud commented Aug 29, 2016

@glenrobson Yes, "WC" is used for "word confidence" in ALTO. Please note that there is an ongoing discussion with regard to how confidence values should be derived and expressed in future ALTO versions: altoxml/schema#23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants