Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible spiders list #599

Open
VahidN opened this issue Aug 21, 2024 · 4 comments
Open

Possible spiders list #599

VahidN opened this issue Aug 21, 2024 · 4 comments

Comments

@VahidN
Copy link

VahidN commented Aug 21, 2024

These are the UA's of crawlers which are not detected by this library as spiders. I will continue report them here in this thread.

python-requests/2.18.4

Go-http-client/2.0

FeedViewer/1.0 (+http://www.feedviewer.app/license)

FeedBurner/1.0 (http://www.FeedBurner.com)

Iframely/1.3.1 (+https://iframely.com/docs/about) Atlassian

Mozilla/5.0 (compatible; Feedspot/1.0 (+https://www.feedspot.com/fs/fetcher; like FeedFetcher-Google)

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.119 Mobile Safari/537.36 (compatible; GoogleOther)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)
@VahidN
Copy link
Author

VahidN commented Sep 14, 2024

Mozilla/5.0 (compatible; WellKnownBot/0.1; +https://well-known.dev/about/#bot)

Chrome Privacy Preserving Prefetch Proxy

@ryedpar
Copy link

ryedpar commented Oct 1, 2024

I'm still new to this set of definitions, but it seems that the catch-all MAC OS definition is preventing the correct classification of the amazonbot as a spider.

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

  # Apple
  # @ref: https://www.apple.com/mac/
  # @note: lookup Mac OS, but exclude iPad, Apple TV, a HTC phone, Kindle, LG
  # @note: put this at the end, since it is hard to implement contains foo, but not contain bar1, bar 2, bar 3 in go's re2
  #########
  - regex: 'Mac OS'
    device_replacement: 'Mac'
    brand_replacement: 'Apple'
    model_replacement: 'Mac'

@VahidN
Copy link
Author

VahidN commented Oct 19, 2024

Mechanize/2.8.1 Ruby/2.7.5p203 (http://github.com/sparklemotion/mechanize/)

@BooVeMan
Copy link

We are recently seeing this one frequently from Russia:

UNKNOWN, UNKNOWN misc crawler UNKNOWN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants