This is an attempt to solve the capthca given out in www.mca.gov.in
Note : Few of the steps are mentioned in Data/KarzaTest.pdf
Script using the following tech stack, no machine learning and not much of opencv stuff.
- Python
- Selenium
- Tesseract-ORC
- Script simulate the chrome browser using selenium and open the link - http://www.mca.gov.in/
- Once the page is loaded, we click on "View Company or LLP Master Data". This opens a new tab. We switch to the newly opened tab.
- The script then takes screenshot of the captcha, and attempts at solving it using tesseract-ocr.
- If it succeeds, we download the data loaded by website using export to excel. And If it fails, we re-try solving. If the second attemp fails, script closes the browser and again start form Step 1.
- This is repeated till we get data for all the Complany CINs of our interset.