fix: TLS fingerprinting prevents scraping #2888 #3384
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
What this PR does / why we need it:
TLS fingerprinting* detection can be used to protect websites from (scraper)bots. Cloudflare provides this service for example**
https://www.ah.nl/allerhande became unavailable since this technique was used. Using a TLS spoofing technique this is countered.
To do so httpx is replaced with curl-cffi (https://pypi.org/project/curl-cffi/0.2.1)
** https://developers.cloudflare.com/bots/concepts/ja3-fingerprint/
Which issue(s) this PR fixes:
fixes #2888
Special notes for your reviewer:
Testing
(fill-in or delete this section)
A docker build was used to build the project. afterwards only the import of recipies from urls was tested using ah.nl/allerhande, lidl-kochen.de and various random other sites.