-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major restructuring #84
Conversation
- Clarify what they are and provide anchors that definition lists can link to - Classify element-property association levels - Grammar for property serialization in title= attributes - Define anchros to class/title attributes in HTML spec
- Clarify what they are and provide anchors that definition lists can link to - Classify element-property association levels - Grammar for property serialization in title= attributes - Define anchros to class/title attributes in HTML spec
As a general note, it looks great and very professional ! |
I suggest to move down the 'Logical Elements' section. It is less significant than the other sections and no OCR engine that we know implements them currently. |
About the grouping of properties (like |
How are they examples?
Granted, bbox is not required for all elements, but it doesn't make sense to have an ocr_carea without bbox or poly. We could also link to 'bbox or poly' or similar.
Can you elaborate? Originally, the spec listed the properties under the category of elements. That led to duplication (e.g. ocr_separator being in floats and typesetting). Now, they are grouped in those categories but can be listed in other categories as well. The list is just everything I could think of, but could be reduced. It makes sense IMHO to be able to say: "ocr_line/ocrx_line can contain any inline properties"
You get these if you click on the dfn in the heading for a property. From the perspective of a hOCR processor, it makes more sense to iterate the elements and parse the properties according to the element definition IMHO rather than the other way around. |
Section 2.2: The abstract description is followed by a specific example with
|
Is that proper English? https://www.quora.com/What-is-a-more-modern-way-to-say-hereinafter-referred-to-as |
These issues seem already pretty detailed and it's a big PR already. I'll merge this and create issues for the wording/notation/property classification if it's okay with you. |
I went ahead and aggressively restructured and expanded the spec over the weekend. This is a big change and touches a lot of issues but since I was in the flow, I decided to just keep going.
Snapshot of current commit: https://rawgit.com/kba/hocr-spec/gen-defs/1.2/index.html
New top level structure
** Define element/property/capability
For the formal definition of elements/properties, created a YAML file that contains info on relations, examples, grammar, categories. Using python script and templates, generate definition lists for each element/property and include in spec.
Still lots to do but it's in a state where I'd love to get feedback.