Skip to content
This repository has been archived by the owner on Sep 21, 2020. It is now read-only.

Commit

Permalink
reduce cache time to three hours
Browse files Browse the repository at this point in the history
  • Loading branch information
drkane committed Apr 16, 2020
1 parent 5d681b1 commit f3aa1d0
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 5 deletions.
3 changes: 1 addition & 2 deletions cronfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ SHELL=/bin/bash

### PLACE ALL CRON TASKS BELOW

# removes unresponsive users from the subscriber list to decrease bounce rates
23 2 * * 0 dokku dokku --rm enter findthatcharity_scrape sh ./scrape_all.sh
23 2 * * 0 dokku dokku --rm enter ftc-scrapers sh ./crawl_all.sh

### PLACE ALL CRON TASKS ABOVE, DO NOT REMOVE THE WHITESPACE AFTER THIS LINE
2 changes: 1 addition & 1 deletion findthatcharity_import/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@
# Enable and configure HTTP caching (disabled by default)
# See https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 60 * 60 * 24 * 7 # one week
HTTPCACHE_EXPIRATION_SECS = 60 * 60 * 3 # three hours
HTTPCACHE_DIR = 'httpcache'
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
4 changes: 2 additions & 2 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,8 @@ git push dokku master

## Other settings

By default, the `HTTPCACHE` extension is enabled, with resources cached for one week.
This means that any data downloaded or websites visited are cached for one week to prevent
By default, the `HTTPCACHE` extension is enabled, with resources cached for three hours.
This means that any data downloaded or websites visited are cached for three hours to prevent
overload of the sites. This means it is relatively risk-free to rerun scraping after
adjusting other settings for e.g. saving to a database. These settings can be changed
if needed.
Expand Down

0 comments on commit f3aa1d0

Please sign in to comment.