Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(store): specify links to navigate to between product searches #1542

Merged
merged 5 commits into from
Jan 17, 2021

Conversation

neatchee
Copy link
Contributor

@neatchee neatchee commented Dec 29, 2020

...to reduce likelihood of bot detection/captcha

Description

Fixes #1504
When querying a store, if deterrentLinks has been defined, pick one at random and navigate there then wait 3 seconds. This should help reduce the likelihood of being sent to a captcha on subsequent page loads from the same store.
Defined in store model as 'captchaDeterrent' with parameters:
hardLinks string[]
searchUrl string
searchTerms string[]

Testing

Add the following to amazon.ts (already added in pr)

	captchaDeterrent: {
		hardLinks: ['https://www.amazon.com/Amazon-Video/b/?ie=UTF8&node=2858778011&ref_=nav_cs_prime_video',
			'https://www.amazon.com/alm/storefront?almBrandId=VUZHIFdob2xlIEZvb2Rz&ref_=nav_cs_whole_foods_in_region',
			'https://www.amazon.com/gp/goldbox?ref_=nav_cs_gb'],
		searchUrl: 'https://www.amazon.com/s?k=%%s&i=todays-deals&ref=nb_sb_noss_2',
		searchTerms: ['goober', 'dungeons+and+dragons']
	},

set HEADLESS=false in dotenv
npm start run

New dependencies

None

@neatchee neatchee requested a review from jef as a code owner December 29, 2020 09:01
@gigi2006
Copy link

Hey @neatchee it is only amazon US?
Please make it for all Amazon Stores 👍

@neatchee
Copy link
Contributor Author

Hey @neatchee it is only amazon US?
Please make it for all Amazon Stores 👍

This is just the framework and one store setup for it. There will be a lot of work required to add the deterrent links for all the different stores that need it.

@undecided2013
Copy link

undecided2013 commented Dec 30, 2020

I merged your changes into my cloned main locally and ran it for a a while. After an hour or so, Amazon captcha started coming up and it did not go away.

p.s. I did not do headless=false, I used true, what is the impact of that?

@neatchee
Copy link
Contributor Author

I merged your changes into my cloned main locally and ran it for a a while. After an hour or so, Amazon captcha started coming up and it did not go away.

p.s. I did not do headless=false, I used true, what is the impact of that?

Headless=true is the default, so it does nothing.

This change only includes a very basic list of links to use for Amazon. Even with a comprehensive list, this change will not protect 100% against captcha. It will still depend on how frequently you query Amazon pages. This is meant to reduce the likelihood by some reasonable amount, not prevent captchas entirely.

@undecided2013
Copy link

I merged your changes into my cloned main locally and ran it for a a while. After an hour or so, Amazon captcha started coming up and it did not go away.
p.s. I did not do headless=false, I used true, what is the impact of that?

Headless=true is the default, so it does nothing.

This change only includes a very basic list of links to use for Amazon. Even with a comprehensive list, this change will not protect 100% against captcha. It will still depend on how frequently you query Amazon pages. This is meant to reduce the likelihood by some reasonable amount, not prevent captchas entirely.

Do you think more links would help? Also, I have tweaked the delay between requests to be longer but I guess that doesn't matter to Amazon, their 'cache' of lookups maybe be a hashmap with no expiration on the keys

@jef
Copy link
Owner

jef commented Jan 3, 2021

I'll look at this in length tomorrow (probably). Thank you @neatchee!

Copy link
Contributor Author

@neatchee neatchee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I just don't understand git but...this was merged into main but isn't showing up in main's commit history???

…uct searches to reduce likelihood of bot detection/captcha

 When querying a store, if deterrentLinks has been defined, pick one at random and navigate there then wait 3 seconds
@neatchee neatchee reopened this Jan 9, 2021
@jef
Copy link
Owner

jef commented Jan 10, 2021

Maybe I just don't understand git but...this was merged into main but isn't showing up in main's commit history???

I don't think so! Perhaps misread?

@jef
Copy link
Owner

jef commented Jan 10, 2021

I wonder for searchTerms if we have a master list of random words rather per store? What do you think of that?

Cool idea though, I like it!

@neatchee
Copy link
Contributor Author

neatchee commented Jan 10, 2021 via email

@jef
Copy link
Owner

jef commented Jan 10, 2021

Hmm... I guess there are only so many retailers that carry generic products. Perhaps this is fine. No worries! We can stick with this.

Copy link
Owner

@jef jef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just a suggestion and comment.

Thanks @neatchee!

src/config.ts Outdated Show resolved Hide resolved
Comment on lines +538 to +540
setTimeout(() => {
// Do nothing
}, 3000);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we want this 3 second timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically an arbitrary "wait after page load so we don't look as bot-like". I was being cautious about us hitting a search page and then immediately bouncing to another page as soon as the search results are loaded. I wanted to at least KINDA look like someone actually viewed the search results hehe.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me :)

@kangelovski
Copy link

kangelovski commented Jan 14, 2021

It seems to work. Tried it against german amazon pages.
Here is my amazon-de.ts - captchaDeterrent part:

captchaDeterrent: { hardLinks: [ 'https://www.amazon.de/Amazon-Video/b?ie=UTF8&node=3010075031', 'https://www.amazon.de/dp/B00NTQ6K7E?ref_=nav_em_adbl_nav_sl_link1_0_2_11_2', 'https://www.amazon.de/gp/goldbox?ref_=nav_cs_de' ], searchTerms: ['hirschhausen', 'matrix', 'tefal', 'playstation', 'fifa', 'xbox'], searchUrl: 'https://www.amazon.de/s?k=%%s&i=todays-deals&ref=nb_sb_noss_2' },

@neatchee
Copy link
Contributor Author

neatchee commented Jan 14, 2021 via email

@kangelovski
Copy link

kangelovski commented Jan 14, 2021

What happens if you have headless true? I've tested it with headless and
didn't have any issue.

EDIT: I've changed some network routing and added some searchTerms and hardLinks, and it does work with headless=true too. (Probably some kind of side effect caused by older bot checks). I will monitor it a little longer.

OLD Comment: The CAPTCHA notification reoccur if HEADLESS is set to true. If HEADLESS is set to false, no CAPTCHA notification occurs.

@jontaru
Copy link

jontaru commented Jan 14, 2021

Testing this and I am still getting captcha with headlless false for amazon-ca, was it merged into the main branch?

@kangelovski
Copy link

kangelovski commented Jan 14, 2021

Testing this and I am still getting captcha with headlless false for amazon-ca, was it merged into the main branch?

It's not merged yet. You would need to modify four files by yourself. Instead of amazon.ts, you would need to modify the amazon-ca.ts file.

@neatchee
Copy link
Contributor Author

neatchee commented Jan 14, 2021 via email

@kangelovski
Copy link

I only received CAPTCHAs again. If I switch to non-headless and back to headless again, the CAPTCHAs disappear. I can imagine the changed viewport of the browser triggered the bot-recognation back to "human user"... Maybe we can randomize the viewport parameters too? (I would recommend, that viewport-size is getting changed between each random 60 seconds / 20 minutes)

@neatchee
Copy link
Contributor Author

neatchee commented Jan 15, 2021 via email

@kangelovski
Copy link

As I mentioned, this alone won't stop captcha. There are a larger number of factors that contribute to whether or not you get sent to a captcha. This is just one tool in the toolbox and does not guarantee that you won't see a captcha. The most important thing with captcha is request frequency. The fewer stores, the fewer models/products, the more likely you are to get a captcha.

Please don't get me wrong. I like that tool! I don't criticize the occurence of captcha in my test. I'm just trying to support you with my insights.

@jef
Copy link
Owner

jef commented Jan 17, 2021

Okay, I'm going to merge this in now. Thank you @neatchee for working on this and answering my questions!

@jef jef enabled auto-merge (squash) January 17, 2021 13:25
@jef jef changed the title feat(store): Specify links (including search + terms) to navigate to between product searches... feat(store): specify links to navigate to between product searches Jan 17, 2021
@jef jef merged commit 0982774 into jef:main Jan 17, 2021
@kakashi84
Copy link

hi, in a some links amazon there is write captcha .i must do something?

@neatchee
Copy link
Contributor Author

neatchee commented Jan 21, 2021 via email

@kakashi84
Copy link

For help please visit the discord server and ask in the #help channel

On Thu, Jan 21, 2021, 3:27 AM kakashi84 @.***> wrote: hi, in a some links amazon there is write captcha .i must do something? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1542 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABA3WHYJOLTXN7NWMWSHNMTS3AFQ3ANCNFSM4VM7JF7A .

witch server?

@neatchee
Copy link
Contributor Author

Check the "looking for help" part of this page: https://jef.codes/streetmerchant/

erwinc1 pushed a commit to erwinc1/streetmerchant that referenced this pull request Mar 31, 2021
…ef#1542)

Fixes jef#1504

When querying a store, if `deterrentLinks` has been defined, pick one at random and navigate there then wait 3 seconds. This should help reduce the likelihood of being sent to a captcha on subsequent page loads from the same store.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Amazon Captcha Fix suggestion
7 participants