A web crawler designed to scrape pokemon card prices from TCGPlayer.com and export them to .csv files.
-
Install Python 3, if you do not have it already.
-
Create a new virtual environment:
python -m venv venv
-
Enter the virtual environment:
Powershell:
. .venv\Scripts\Activate.ps1
cmd.exe:
. .venv\Scripts\activate.bat
Linux:
source .venv/bin/activate
-
Install dependencies:
pip install -r requirements playwright install
- Enter the virtual environment, if you are not in it already. (See step 3 of the installation instructions)
- Run the crawler with the following command:
scrapy crawl 'main`
- A window will pop up with a list of sets that can be scrapped. Check the ones that you want and then close the window.
- Wait and eventually it should complete.
File | Purpose |
---|---|
settings.py | Settings for Scrapy and the spider |
pipelines.py | Pipeline that takes items and outputs them to CSV files. |
items.py | The data structure for the scraped data |
spiders/main_spider.py | The spider code that handles requesting and parsing data. |
Dependency | Min Version | Reason Used | Notes |
---|---|---|---|
scrapy | 2.11.0 | Framework that orchestrates the scraping process and provides a CLI tool for running the scaper. | |
playwright | 1.15 | Runs a headless browser that downloads dynamic content. | |
scrapy-playwright | Special | Implements a Scrapy download handler that lets scrapy download pages using playwright. | This project uses a fork of scrapy-playwright that lets it run on Windows, rather than just Linux. This is included in source form in this project rather than as a submodule |
wxPython | 4.2.1 | Used to implement the set selector window |