Plain Text vs. Format Output #182

ep0p · 2024-11-07T14:15:28Z

I've encountered an issue with GOT text recognition during inference. Using 'plain text' often results in awkwardly spaced letters within words. However, switching to 'format' produces well-structured and accurate text. My document is in French, and GOT handles French unexpectedly well without fine-tuning. Here are the examples:

Here is the document:

Here is the inference in plain text:

Here is the inference with 'format':

Ucas-HaoranWei · 2024-11-09T04:36:27Z

Ha ha, the 'plain text' only uses the Fitz extracted data without format.
I am delighted that GOT‘s zero-shot ability in French is good. Thank you for the test.

ep0p · 2024-11-12T09:46:01Z

Would there by any way to make it see well the awkwardly spaced letters within words, since using format sometimes does not recognize all the blocks of text on a page?

Also since i can not get access to WeChat, i have some questions for which maybe you can provide an answer :)

When testing various document types, I’ve observed that some text blocks, particularly in documents with complex layouts, are occasionally missed or ignored by the OCR.
Are there any workarounds or best practices to improve accuracy and ensure these blocks are detected?
Would fine-tuning the model address or reduce this issue?
What is the recommended VRAM capacity for fine-tuning the OCR model efficiently?
Specifically, could you share the minimum or ideal VRAM needed to handle a reasonable batch size without significant performance delays?
If I fine-tune the model using complex images and specify the desired reading order of the text, can the model learn and adapt to this sequence?

Ucas-HaoranWei added the good first issue Good for newcomers label Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plain Text vs. Format Output #182

Plain Text vs. Format Output #182

ep0p commented Nov 7, 2024

Ucas-HaoranWei commented Nov 9, 2024

ep0p commented Nov 12, 2024

Plain Text vs. Format Output #182

Plain Text vs. Format Output #182

Comments

ep0p commented Nov 7, 2024

Ucas-HaoranWei commented Nov 9, 2024

ep0p commented Nov 12, 2024