OCR script for Visual Novels/general text on images
xfce4-screenshooter
screenshots a region and saves it to ~/ocr.png
swap it to your own screenshot program if it doesn't worktesseract
processes that image and outputs the text it finds to ~/ocr.txt
install tesseract (I usedsudo apt install tesseract-ocr
on xubuntu)
and download and put trained models for the language you need where they belong (it was/usr/share/tesseract-ocr/4.00/tessdata
for me)tr
cleans up output text from tesseractxclip
passes text in clipboard with where it is caught by yomichan
that opens a popup with that text where you can look up word definitions
make sure to check "Enable native popups when copying Japanese text" in yomichan options
(be careful when copying big texts that can contain kana/kanji in them)
- Bind
ocr_script
to a hotkey - Press the hotkey
- Select a region with text
- Text extracted by OCR will be copied to clipboard
Tesseract Docs - Improving Recognition Quality
StackOverflow - Remove background text and noise from an image using image processing with OpenCV
StackOverflow - How to remove background noise in image without damaging text?
StackOverflow - Background image cleaning for OCR