Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add checkpoint_dir content-type, remove checkpoint variant #70

Merged
merged 9 commits into from
Feb 23, 2022

Conversation

bertsky
Copy link
Contributor

@bertsky bertsky commented Feb 10, 2022

This removes the older of the two parameterizations (a direct Calamari-like glob expression) in favour of the newer, more OCR-D-like model directory, and adds a proper format and content-type as required by the specs for optimal resmgr handling.

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2022

Codecov Report

Merging #70 (5fddd32) into master (76b34c5) will increase coverage by 0.97%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
+ Coverage   88.37%   89.34%   +0.97%     
==========================================
  Files           3        3              
  Lines         172      169       -3     
  Branches       39       38       -1     
==========================================
- Hits          152      151       -1     
+ Misses         11       10       -1     
+ Partials        9        8       -1     
Impacted Files Coverage Δ
ocrd_calamari/recognize.py 88.95% <100.00%> (+1.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 76b34c5...5fddd32. Read the comment docs.

@bertsky
Copy link
Contributor Author

bertsky commented Feb 11, 2022

I had to fix the tests as well. We could also use kant_aufklaerung_1784-binarized instead of kant_aufklaerung_1784-page-region-line-word_glyph to get binary input without the need for ad-hoc IM binarization BTW.

@mikegerber
Copy link
Collaborator

mikegerber commented Feb 16, 2022

It's a bit unfortunate to drop the former default parameter checkpoint, is there really no way to still support this and phase it out afterwards?

@bertsky
Copy link
Contributor Author

bertsky commented Feb 21, 2022

It's a bit unfortunate to drop the former default parameter checkpoint, is there really no way to still support this and phase it out afterwards?

No, unfortunately I can't see any. We had to get to terms with our file vs directory resources and the semantics of content-type in a ocrd-tool.json – see related discussions in core. Unless we make the resource mechanism much more complicated (by specifying which params can have which resources), if we want to properly support directories as a resource type, individual processors should not have both.

@mikegerber mikegerber self-assigned this Feb 23, 2022
@mikegerber mikegerber merged commit 1eb342e into OCR-D:master Feb 23, 2022
@mikegerber
Copy link
Collaborator

mikegerber commented Sep 16, 2022

I've also removed the checkpoint mention in the README in #80!

@mikegerber mikegerber mentioned this pull request Oct 16, 2023
3 tasks
mikegerber added a commit that referenced this pull request Oct 17, 2023
PR #70 changed the model download and did not update the README
accordingly. Fix the README.

Also update the example download to use a single page with existing
binarization and segmentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants