a multi-threaded spider with a web interface
first, make sure you pip install the requirements:
pip install httplib2 pip install lxml pip install -e git+https://github.com/coleifer/django-utils.git#egg=djutils pip install -e git+https://github.com/coleifer/django-spider.git#egg=spider
add djutils
and spider
to your settings file and make sure you run
manage.py syncdb
.
add spider.urls
to your root urlconf:
from django.conf import settings from django.conf.urls.defaults import * from django.contrib import admin admin.autodiscover() urlpatterns = patterns('', url(r'^admin/', include(admin.site.urls)), url(r'', include('spider.urls')), )
make sure the media in the spider app is copied into your static media directory.
start up the task queue:
# assume your cwd is the root dir of virtualenv export DJANGO_SETTINGS_MODULE=mysite.settings ./bin/python ./src/djutils/djutils/queue/bin/consumer.py start -l ./logs/queue.log -p ./run/queue.pid