NLP

This project crawl through thousends of vg artickles

When enabling "deep dive" scrapy filter out some pages due to dupefilter/filtered which is used to detect and filter duplicate requests. As a result, the storing of the articles will not happend due to as mismatch of lengths.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
deleted/spiders		deleted/spiders
notebooks		notebooks
spiders		spiders
.gitignore		.gitignore
README.md		README.md
env_funny.yml		env_funny.yml
items.py		items.py
middlewares.py		middlewares.py
pipelines.py		pipelines.py
scrapy.cfg		scrapy.cfg
settings.py		settings.py

Provide feedback