Sharing a way I use to bypass/deal with captcha as I webscrape with Undetected-Chromedriver and Selenium,Python. #1741
cole-le
started this conversation in
Show your project! using undetected-chromedriver
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Over the months, I have seen many websites being able to detect our repository’s undetected chromedriver and putting up a captcha screen. Many issues complained about this and are still active, and since I know a solution to deal with this, I would like to share it with you guys and hope that this will allow us to pursue the freedom to collect the published data on the Internet!:)
While my method will not bypass all captcha, it will bypass a lot of captcha and allow a lot more people to be able to use undetected chromedriver to collect data freely from the Internet.
My Implementation/Contribution:
Most people use this GitHub repository to create Selenium bots to perform automated processes with websites on the Internet. Now most of the time, during the development process, the developers can test locally on their computer and see how their bots perform automated tasks in real time, however usually they don’t get issues with captcha thrown toward them from these websites because they can only test by sending out a small number of web requests.
Captcha is usually raised by websites when they see a large amount of requests sent from an IP address. So when a bot repeatedly does something for a while (days after days), the website would flag this IP and prompt a captcha. This usually disturbs whatever process the bot was doing. In a simple example, a process a bot can be doing is locating a piece of data from a specified web element like this one:
(this bot is locating the price from a website)
However, if this websites decide to prompt a captcha, this element that the bot is trying to locate and capture the data wouldn’t even exist, and thus if the bot don’t have a system to deal with this possible captcha interference, it will break down and won’t be able to do all the planned process after the interrupted process.
A captcha would look like this:
To solve this issue, we need to put a try and catch code in Python to try and locate the web element with the bot, but if we can’t, we will tell our bot to pause (time.sleep()) what they are doing, capture a picture of this captcha, and sends it to a Captcha solver service using Python library and APIs, then get the results and uses Selenium to enter the correct captcha code into the page, then click on the captcha submit button, and continue with the processes they are supposed to do.
Here are the Python and Selenium codes I used to achieve this:
Explanation:
-the 2nd line of code (line 2370) identifies if there is a captcha present on the screen, if not, the except line can catch it and allow the bots to work as normal. The 3rd and 4rth lines are using Python pip library to connect via APIs to a captcha solving service online called TwoCaptcha. The 5th line captures the captcha on the screen and stores it in a variable, which we then pass to the captcha solver on the 6th line and get the correct code for the captcha and store this in another variable. We then use Selenium to put this code into the captcha box and click the submit button. We then have a sleep(0.4) so that we can let the website load a bit and populate with the desired data that we wanted our bots to go and collect in the beginning. The codes after this are the beginning of the normal code that the bot was given to do.
And voila, that’s it. This is the code and method that any one in this Github repository can use to bypass any captcha that their Selenium bot might meet. Last thing I would say is you might wonder how can we even know how to locate an element that signifies the captcha if our bot never met a captcha before, and that is solved by using try and catch. If you put a lot of try and catch and put a sleep in the catch in many place in your bots, if a captcha arises that you didn’t prepare your bot to handle it, your bot will simply sleep and freeze the screen, allowing you to inspect it and locate an element that signifies the captcha and follow my codes/method above.
Pls let me know if this post helped you in someway and ask me if you have any question!:)
Beta Was this translation helpful? Give feedback.
All reactions