An example of how to use Django ORM to store in a db obtained data by a Scrapy Spider an then exopse the data through an REST API
As an example, i set up this project to scrap all over rolling stone lists/rankings and store them in a relational db with proper data models
- Python 2.7
- pip
- virtualenv
- Some broker compatible with celery, i use redis
- a db compatible with django, i use sqlite 3 in dev, postgres or mongodb in prod. If you are not familiar how django manages dbs go here
Clone project and install requirements in virtualenv
# install fabric in python global enviroment
pip install fabric
# clone repo
git clone git://github.com/drkloc/rstone_scrapper.git
cd rstone_scrapper
# setup app
fab DEV setup
You need to install lxml with static deps before runing pip against requirements file:
STATIC_DEPS=true pip install lxml
Any settings override (Database config, broker config, etc) are conveniently made inside settings_local.py. Just copy the demo file:
cp settings_local_demo.py settings_local.py
and start customizing whatever you want/need.
redis-server
python manage.py celeryd
scrapy runspider scrap.py
python manage.py runserver