Yet another dataset about Movies, TV Shows and Games.
This is implementation of Criticker Dataset. This repository contains the necessesary spiders for dataset creation alongside with some basic tests.
great_expectations
tool is used for Data Quality purposes, check here the datadocs
poetry
module is used for virtual environment and dependency management
poetry install
poetry run scrapy crawl games_spider -o data/raw/games.csv # to retrieve games
# export login username and password
export C_USERNAME='<USERNAME>'
export C_PASSWORD='<PASSWORD>'
poetry run scrapy crawl movies_spider -o data/raw/movies.csv # to retrieve movies
poetry run pytest
- Add games
- TCI related data
- Add reviews