Templateless OCR solution for Maverics Botathon
Table of Contents
We approach this problem in 2 parts to obtain the details of the Invoice.
The first Part:
First our Algorithm Uses Thresholding And Morphological Transforms to detect upper boxes and after these upper boxes are detected, the text is obtained using an OCR and then its stored in a “csv” file.
The Second Part:
For the second part we use Tablenet , a deep learning model inspired from the paper:- After the image is passed through tablenet , tables along with columns are detected which makes it easier to get the line data from the Central Table and after this these are passed through OCR to get the Text,and then its stored in a “csv” file.Our solution is robust enough to process both digital and scanned copies of Invoices.
This section should list any major frameworks that you built your project using. Leave any add-ons/plugins for the acknowledgements section. Here are a few examples.
- https://arxiv.org/abs/2001.01469
- https://arxiv.org/abs/1703.06870
- https://link.springer.com/chapter/10.1007/11551188_67#:~:text=We%20propose%20a%20workflow%20for,and%20(iii)%20table%20detection.
- https://arxiv.org/abs/2011.13534
Distributed under the MIT License. See LICENSE
for more information.