-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(store): specify links to navigate to between product searches #1542
Conversation
Hey @neatchee it is only amazon US? |
This is just the framework and one store setup for it. There will be a lot of work required to add the deterrent links for all the different stores that need it. |
I merged your changes into my cloned main locally and ran it for a a while. After an hour or so, Amazon captcha started coming up and it did not go away. p.s. I did not do headless=false, I used true, what is the impact of that? |
Headless=true is the default, so it does nothing. This change only includes a very basic list of links to use for Amazon. Even with a comprehensive list, this change will not protect 100% against captcha. It will still depend on how frequently you query Amazon pages. This is meant to reduce the likelihood by some reasonable amount, not prevent captchas entirely. |
Do you think more links would help? Also, I have tweaked the delay between requests to be longer but I guess that doesn't matter to Amazon, their 'cache' of lookups maybe be a hashmap with no expiration on the keys |
I'll look at this in length tomorrow (probably). Thank you @neatchee! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I just don't understand git but...this was merged into main but isn't showing up in main's commit history???
…uct searches to reduce likelihood of bot detection/captcha When querying a store, if deterrentLinks has been defined, pick one at random and navigate there then wait 3 seconds
I don't think so! Perhaps misread? |
I wonder for searchTerms if we have a master list of random words rather per store? What do you think of that? Cool idea though, I like it! |
I like the ability to go per-store incase they get wise to searches that
return no results, but I'd totally be happy with a default list to use if
one isn't specified for the store
…On Sun, Jan 10, 2021, 10:06 AM Jef LeCompte ***@***.***> wrote:
I wonder for searchTerms if we have a master list of random words rather
per store? What do you think of that?
Cool idea though, I like it!
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1542 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA3WH6B7UHZEA4W7QR2EE3SZHUBZANCNFSM4VM7JF7A>
.
|
Hmm... I guess there are only so many retailers that carry generic products. Perhaps this is fine. No worries! We can stick with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just a suggestion and comment.
Thanks @neatchee!
setTimeout(() => { | ||
// Do nothing | ||
}, 3000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we want this 3 second timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically an arbitrary "wait after page load so we don't look as bot-like". I was being cautious about us hitting a search page and then immediately bouncing to another page as soon as the search results are loaded. I wanted to at least KINDA look like someone actually viewed the search results hehe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me :)
Co-authored-by: Jef LeCompte <[email protected]>
It seems to work. Tried it against german amazon pages.
|
What happens if you have headless true? I've tested it with headless and
didn't have any issue.
…On Thu, Jan 14, 2021, 2:51 AM kangelovski ***@***.***> wrote:
It seems to work. Tried it against german amazon pages. Notice: It only
works with HEADLESS set to false.
Good work!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1542 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA3WH3PEO5AWRD36FV4IW3SZ3EDLANCNFSM4VM7JF7A>
.
|
EDIT: I've changed some network routing and added some searchTerms and hardLinks, and it does work with headless=true too. (Probably some kind of side effect caused by older bot checks). I will monitor it a little longer. OLD Comment: The CAPTCHA notification reoccur if HEADLESS is set to true. If HEADLESS is set to false, no CAPTCHA notification occurs. |
Testing this and I am still getting captcha with headlless false for amazon-ca, was it merged into the main branch? |
It's not merged yet. You would need to modify four files by yourself. Instead of amazon.ts, you would need to modify the amazon-ca.ts file. |
Oh, yeah, this won't completely defend against captcha, especially if your
IP has already been flagged.
…On Thu, Jan 14, 2021, 3:23 AM kangelovski ***@***.***> wrote:
What happens if you have headless true? I've tested it with headless and
didn't have any issue.
… <#m_-2948378581425577092_>
On Thu, Jan 14, 2021, 2:51 AM kangelovski *@*.***> wrote: It seems to
work. Tried it against german amazon pages. Notice: It only works with
HEADLESS set to false. Good work! — You are receiving this because you were
mentioned. Reply to this email directly, view it on GitHub <#1542
(comment)
<#1542 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABA3WH3PEO5AWRD36FV4IW3SZ3EDLANCNFSM4VM7JF7A
.
The CAPTCHA notification reoccur. If HEADLESS is set to false, no CAPTCHA
notification occurs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1542 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA3WHYVG33DMH6NN3I7UCTSZ3H3DANCNFSM4VM7JF7A>
.
|
I only received CAPTCHAs again. If I switch to non-headless and back to headless again, the CAPTCHAs disappear. I can imagine the changed viewport of the browser triggered the bot-recognation back to "human user"... Maybe we can randomize the viewport parameters too? (I would recommend, that viewport-size is getting changed between each random 60 seconds / 20 minutes) |
As I mentioned, this alone won't stop captcha. There are a larger number of
factors that contribute to whether or not you get sent to a captcha. This
is just one tool in the toolbox and does not guarantee that you won't see a
captcha.
The most important thing with captcha is request frequency. The fewer
stores, the fewer models/products, the more likely you are to get a
captcha.
…On Fri, Jan 15, 2021, 12:40 AM kangelovski ***@***.***> wrote:
I only received CAPTCHAs again. If I switch to non-headless and back to
headless again, the CAPTCHAs disappear. I can imagine the changed viewport
of the browser triggered the bot-recognation back to "human user"... Maybe
we can randomize the viewport parameters too? (I would recommend, that
viewport-size is getting changed between each random 60 seconds / 20
minutes)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1542 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA3WH2LTPHKVQ6HKH3GO5TSZ75QPANCNFSM4VM7JF7A>
.
|
Please don't get me wrong. I like that tool! I don't criticize the occurence of captcha in my test. I'm just trying to support you with my insights. |
Okay, I'm going to merge this in now. Thank you @neatchee for working on this and answering my questions! |
hi, in a some links amazon there is write captcha .i must do something? |
For help please visit the discord server and ask in the #help channel
…On Thu, Jan 21, 2021, 3:27 AM kakashi84 ***@***.***> wrote:
hi, in a some links amazon there is write captcha .i must do something?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1542 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABA3WHYJOLTXN7NWMWSHNMTS3AFQ3ANCNFSM4VM7JF7A>
.
|
witch server? |
Check the "looking for help" part of this page: https://jef.codes/streetmerchant/ |
...to reduce likelihood of bot detection/captcha
Description
Fixes #1504
When querying a store, if deterrentLinks has been defined, pick one at random and navigate there then wait 3 seconds. This should help reduce the likelihood of being sent to a captcha on subsequent page loads from the same store.
Defined in store model as 'captchaDeterrent' with parameters:
hardLinks string[]
searchUrl string
searchTerms string[]
Testing
Add the following to amazon.ts (already added in pr)
set HEADLESS=false in dotenv
npm start run
New dependencies
None