Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
large-ocr-model authored Jan 16, 2024
1 parent 60a921a commit 0fe2e92
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,14 @@ Recently, multimodal large models have received widespread attention in academia

In the field of natural language processing (NLP), the relationship between model size, data volume, computing power and model performance has been extensively studied. However, in the field of optical character recognition (OCR), the exploration of these "scaling laws" is still in its infancy. To fill this gap, we conducted a comprehensive study and in-depth analysis of the relationship between model size, data volume, and computing power and OCR performance. The results reveal that, holding other influencing factors constant, there is a smooth exponential relationship between performance and model size and training data volume. In addition, we also create a large-scale dataset REBU-Syn, containing 6 million real samples and 18 million synthetic samples. Using these rules and data sets, we successfully trained a high-precision OCR model and achieved SOTA accuracy on the OCR test benchmark. **In particular, we found that the OCR model can significantly enhance the capabilities of multi-modal large models and achieve significant accuracy improvements on multiple VQA tasks, proving the great potential of OCR in improving the performance of multi-modal large models.**

<p align="center"><img src="assets/f1.png"{:height="40%" width="40%"}></p>

<div align="center"><img src="assets/f1.png" style="zoom:40%" alt="f1"/></div>

## 🛠️ Dataset

In the field of OCR, the quality and diversity of data sets are extremely important. We created a new data set REBU-Syn by collecting and integrating open source data sets. In addition, we utilize the latest generation technology to generate 60M synthetic data MJST+ for additional use.

<p align="center"><img src="assets/table3.png"{:height="60%" width="60%"}></p>
<div align="center"><img src="assets/table3.png" style="zoom:60%" alt="table3"/></div>

## 🗝️ Scaling Law for OCR

Expand Down

0 comments on commit 0fe2e92

Please sign in to comment.