Generic Web Scraper for Affiliate Website Products

Installation

Requires Python 3.6.5 and up

pip install -r requirements.txt

Parameters:

-o <name_of_file> Output file to write to. EX: url_results.json. Will write it to <path-to>/affiliate_web_scraper/affiliate_web_scraper/spiders/<name_of_file>

-a domains=<domains> Scrapes specific list of domains, delimited by ',' (comma)
-a allowed_domains=<allowed_domains> Does not scrape any url that is outside these allowed domains, delimited by ','
-a search_term=<exact phrase to find> Checks text descriptions of html page for this term

Run from Pycharm:

# Tested on Windows
scrapy.cmdline runspider generic_scraper.py -o url_results.json -a domains=https://thewirecutter.com/,https://appliancebuyersguide.com -a allowed_domains=thewirecutter.com,appliancebuyersguide.com -a search_term=SHEM63W55N

Run from Command Line:

# From directory: <path-to>/affiliate_web_scraper/
scrapy runspider affiliate_web_scraper/spiders/generic_scraper.py -o url_results.json -a domains=https://thewirecutter.com/,https://appliancebuyersguide.com -a allowed_domains=thewirecutter.com,appliancebuyersguide.com -a search_term=SHEM63W55N

Notes

Do not use on digitaltrends.com or consumeraffairs.com

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
affiliate_web_scraper		affiliate_web_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generic Web Scraper for Affiliate Website Products

Installation

Parameters:

Run from Pycharm:

Run from Command Line:

Notes

About

Releases

Packages

Languages

License

trile127/affiliate_web_scraper

Folders and files

Latest commit

History

Repository files navigation

Generic Web Scraper for Affiliate Website Products

Installation

Parameters:

Run from Pycharm:

Run from Command Line:

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages