- install pyenv:
brew pyenv
- install virtualenv:
- install python:
$ pyenv install 3.6.1
- install requirements:
$ pip install -r requirements.txt
- create new virtualenv:
$ pyenv virtualenv 3.6.1 playground
- activate a virtualenv:
$ pyenv activate playground
- 运行调度:
python scrapy_scheduler.py
- 运行所有爬虫:
python run_all_spiders.py
- 新建爬虫:
scrapy gensipder <new_spider> <url>
- 命令
scrapyd
启动 scrapyd,默认在localhost:6800建立监控界面
- lib目录下放置jdk-6u45-linux-x64.bin
- lib目录下放置hanlp model
- lib目录下放置THUCTC model
- docker build -t autonews-scrapy .