You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've encountered an issue with GOT text recognition during inference. Using 'plain text' often results in awkwardly spaced letters within words. However, switching to 'format' produces well-structured and accurate text. My document is in French, and GOT handles French unexpectedly well without fine-tuning. Here are the examples:
Here is the document:
Here is the inference in plain text:
Here is the inference with 'format':
The text was updated successfully, but these errors were encountered:
Ha ha, the 'plain text' only uses the Fitz extracted data without format.
I am delighted that GOT‘s zero-shot ability in French is good. Thank you for the test.
Would there by any way to make it see well the awkwardly spaced letters within words, since using format sometimes does not recognize all the blocks of text on a page?
Also since i can not get access to WeChat, i have some questions for which maybe you can provide an answer :)
When testing various document types, I’ve observed that some text blocks, particularly in documents with complex layouts, are occasionally missed or ignored by the OCR.
Are there any workarounds or best practices to improve accuracy and ensure these blocks are detected?
Would fine-tuning the model address or reduce this issue?
What is the recommended VRAM capacity for fine-tuning the OCR model efficiently?
Specifically, could you share the minimum or ideal VRAM needed to handle a reasonable batch size without significant performance delays?
If I fine-tune the model using complex images and specify the desired reading order of the text, can the model learn and adapt to this sequence?
I've encountered an issue with GOT text recognition during inference. Using 'plain text' often results in awkwardly spaced letters within words. However, switching to 'format' produces well-structured and accurate text. My document is in French, and GOT handles French unexpectedly well without fine-tuning. Here are the examples:
Here is the document:
Here is the inference in plain text:
Here is the inference with 'format':
The text was updated successfully, but these errors were encountered: