Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon type OCR #242

Closed
marcincichocki opened this issue Oct 13, 2021 · 0 comments
Closed

Daemon type OCR #242

marcincichocki opened this issue Oct 13, 2021 · 0 comments
Milestone

Comments

@marcincichocki
Copy link
Owner

marcincichocki commented Oct 13, 2021

When I started I left out daemon type ocr for few reasons:

  • It's language dependant
  • I thought ocr would be slow
  • It doesn't provide many benefits

However, since other features are completed, I thought I would revisit this idea. Here are my thoughts after few hours of testing:

  • Language data is available for free on github(license apache 2.0).
  • Loading language takes a while(300ms) so it's a question how to distribute it and how to load it so it's responsive.
  • Fragment preprocessing got big upgrade, by extracting blue channel from source I was able to clear every noise/border and other junk that would just clutter the image. This is must be implemented for daemon fragment.
  • I also noticed that sharp is quite slow(150ms), I might need to see how much time does it take to process images on production.
  • Text recognition is surprisingly easy, most data from registry was easy to decode, and text seems correct(can't validate for some exotic language but it looks good).
  • It's super fast, and with worker support performance impact will be negligible.
  • In the event of failure daemon could be marked as UNKNOWN or UNDETECTED.
  • Daemon type would always be recognized to its id.
  • I need list of every daemon and screenshots for for them in every language so I can be sure they work.
  • I also need to know what to do when wrong ocr language is selected.
  • with data in place, there has to be some use for it, as first step I could add it to the viewer, later on maybe create sequence sort based on type.

Overall there is lots to do.

@marcincichocki marcincichocki added this to the v2.3.0 milestone Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant