19th-century-fraktur-OCR

OCRopus/Calamari mixed models for recognizing (German) 19th century Fraktur scripts.

Models

All models were trained on and should be applied to binary data since this showed the best results during evaluation (see [1]). We provide

a Calamari voting ensemble consisting of five models (best results).
a single Calamari model which should perform slightly better then each single model of the ensemble (if recognition time is a big issue).
a OCRopus model.

Data

At max 50 lines of each book in our training corpus are provided allowing to adapt the models to your own transcription guidelines. Adjust the ground truth as desired and then train a new model using the provided models as a starting point (Calamari: --weights, OCRopus: --load). For a description of the entire training data as well as a source for download please see [2].

Related Publications

[1] Reul, Springmann, Wick, Puppe. (2018). State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines. Submitted to DHd2019. Preprint available at ArXiv e-prints.

@article{reul2018fraktur,
  title = {State of the Art Optical Character Recognition of 19th Century Fraktur Scripts using Open Source Engines},
  author = {Reul, Christian and Springmann, Uwe and Wick, Christoph and Puppe, Frank},
  journal = {ArXiv e-prints},
  url = {https://arxiv.org/abs/1810.03436},
  year = {2018}
}

[2] Springmann, Reul, Dipper, Baiter. (2018). Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin. Submitted to Submitted to JLCL Volume 33 (2018), Issue 1: Special Issue on Automatic Text and Layout Recognition. Preprint available at ArXiv e-prints.

@article{springmann2018groundtruth,
  title = {Ground Truth for training OCR engines on historical documents in German Fraktur and Early Modern Latin},
  author = {Springmann, Uwe and Reul, Christian and Dipper, Stefanie and Baiter, Johannes},
  journal = {ArXiv e-prints},
  url = {https://arxiv.org/abs/1809.05501},
  year = {2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
models		models
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

19th-century-fraktur-OCR

Models

Data

Related Publications

About

Releases

Packages

License

chreul/19th-century-fraktur-OCR

Folders and files

Latest commit

History

Repository files navigation

19th-century-fraktur-OCR

Models

Data

Related Publications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages