Update README.md

large-ocr-model · Jan 16, 2024 · 0fe2e92 · 0fe2e92
1 parent 60a921a
commit 0fe2e92
Showing 1 changed file with 3 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -23,13 +23,14 @@ Recently, multimodal large models have received widespread attention in academia
 
 In the field of natural language processing (NLP), the relationship between model size, data volume, computing power and model performance has been extensively studied. However, in the field of optical character recognition (OCR), the exploration of these "scaling laws" is still in its infancy. To fill this gap, we conducted a comprehensive study and in-depth analysis of the relationship between model size, data volume, and computing power and OCR performance. The results reveal that, holding other influencing factors constant, there is a smooth exponential relationship between performance and model size and training data volume. In addition, we also create a large-scale dataset REBU-Syn, containing 6 million real samples and 18 million synthetic samples. Using these rules and data sets, we successfully trained a high-precision OCR model and achieved SOTA accuracy on the OCR test benchmark. **In particular, we found that the OCR model can significantly enhance the capabilities of multi-modal large models and achieve significant accuracy improvements on multiple VQA tasks, proving the great potential of OCR in improving the performance of multi-modal large models.**
 
-<p align="center"><img src="assets/f1.png"{:height="40%" width="40%"}></p>
+
+<div align="center"><img src="assets/f1.png" style="zoom:40%" alt="f1"/></div>
 
 ## 🛠️ Dataset
 
 In the field of OCR, the quality and diversity of data sets are extremely important. We created a new data set REBU-Syn by collecting and integrating open source data sets. In addition, we utilize the latest generation technology to generate 60M synthetic data MJST+ for additional use.
 
-<p align="center"><img src="assets/table3.png"{:height="60%" width="60%"}></p>
+<div align="center"><img src="assets/table3.png" style="zoom:60%" alt="table3"/></div>
 
 ## 🗝️ Scaling Law for OCR