You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
None of these are "per-glyph" because "glyph" isn't a uniquely defined
concept independent of font. As far as hOCR is concerned, you need to
output information per codepoint. There is no single correct way of doing
that: it depends on the script, the encoding, and the OCR engine.
For bounding boxes (or cuts) on accented Western scripts, my recommendation
would be: (1) view the whole accented letter as a single glyph, (2) use
normalized unicode with composed characters, (3) if a single glyph
corresponds to multiple codepoints, output a bounding box for the first
codepoint and output empty bounding boxes for the remaining codepoints.
We should define it and s/character/codepoint in the spec.
The text was updated successfully, but these errors were encountered:
#17 (comment)
We should define it and
s/character/codepoint
in the spec.The text was updated successfully, but these errors were encountered: