OCR4all

As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety of historical printings and obtain high quality results with reasonable time expenditure. Therefore, OCR4all is explicitly geared towards users with no technical background. If you are one of those users (or if you just want to use the tool and are not interested in the code), please go to the getting started project where you will find guides and test data.

Please note that OCR4all main focus is a semi-automatic workflow allowing users to perform OCR even on the earliest printed books, which is a very challenging task that often requires a significant amount of manual interaction, especially when almost perfect quality is desired. If you are looking for mass digitization of historical material the OCR-D project (in progress) might be worth a look.

This repository contains the code for the main interface and server of the OCR4all project, while the repositories OCR4all/docker_image and OCR4all/docker_base_image are about the creation of a preconfigurated docker image.

For installing the complete project with a docker image, please follow the instructions here.

Mailing List

OCR4all is under active development and consequently, frequent releases containing bug fixes and further functionality can be expected. In order to always be up to date, we highly recommend subscribing to our mailing list where we will always announce notable enhancements.

Current Developments

Plans for the (very) near future:

Enabling a second project management approach solely based on PageXML allowing for a more flexible workflow.
Integrating Tesseract for recognition.
Many minor bug fixes and improvements.

Built With

Docker - Platform and Software Deployment
Maven - Dependency Management
Spring - Java Framework
Materialize - Front-end Framework
jQuery - JavaScript Library

Included Projects

OCRopus - Collection of document analysis programs
calamari - OCR Engine based on OCRopy and Kraken
LAREX - Layout analysis on early printed books
Kraken - OCR engine for all the languages
nashi - Some bits of javascript to transcribe scanned pages using PageXML

Authors and Helping Hands

Christian Reul (project lead) - Email: [email protected]
Dennis Christ and Alexander Hartelt (OCR4all web development)
Christoph Wick (calamari)
Nico Balbach (LAREX web GUI)
Andreas Büttner (nashi)
Björn Eyselein (distribution via Docker)
Maximilan Wehner (tireless testing, guides, and non-technical user support)
Christine Grundig, Frank Puppe, and Uwe Springmann (ideas and feedback)
...

Name		Name	Last commit message	Last commit date
Latest commit History 981 Commits
src		src
.gitignore		.gitignore
.travis.settings.xml		.travis.settings.xml
.travis.yml		.travis.yml
CD.md		CD.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR4all

Mailing List

Current Developments

Built With

Included Projects

Authors and Helping Hands

About

Releases

Packages

Languages

License

chreul/OCR4all

Folders and files

Latest commit

History

Repository files navigation

OCR4all

Mailing List

Current Developments

Built With

Included Projects

Authors and Helping Hands

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages