NOTE: This project is no longer actively maintained.
This project provides a command-line tool for scraping COVID-19 data from countries around the world.
The scrapers target the subset of countries that offer coronavirus data at the level of administrative units (provinces, states, territories within a country).
Organizations such as Johns Hopkins University are a better resource for comprehensive country-wide figures.
- Install a recent version of Firefox.
- Download and unpack Geckodriver to a location on the PATH (or update PATH env variable to include its location).
- Install the
covid-world-scraper
command-line tool:
pip install git+https://github.com/biglocalnews/covid-world-scraper#egg=covid-world-scraper
The covid-world-scraper
command-line tool lets you download the
current data for a country by supplying one or more 3-letter ISO country codes.
# List available country scrapers
covid-world-scraper -l
# Run all scrapers at once, sequentially
covid-world-scraper --all
# Run selected countries (Brazil, Germany, Pakistan)
# by passing in one or more 3-letter ISO country codes
covid-world-scraper bra deu pak
# To see other available CLI options
covid-world-scraper --help
By default, data for each country is written to a covid-world-scraper-data
folder
in a user's home directory. This location can be updated using the
--cache-dir
flag:
covid-world-scraper --cache-dir=/tmp/some-other-name bra
For each country, scrapers download and store one or more file artifacts in a raw
directory. These files may be screenshots, HTML, Excel files, etc. Data
extracted from these raw sources are stored in a processed
directory
for each country. Files in both directories are named based on the
UTC runtime of the scraper.
Below is an example showing file artifacts generated by the Pakistan scraper on two consecutive days in June 2020.
The types of
raw
files saved for a given country vary widely and reflect the different ways each country posts it data.
covid-world-scraper-data/pak
├── processed
│ ├── 20200627T0126Z.csv
│ └── 20200628T1705Z.csv
└── raw
├── 20200627T0126Z.html
├── 20200627T0126Z.png
├── 20200627T0126Z.txt
├── 20200628T1705Z.html
├── 20200628T1705Z.png
└── 20200628T1705Z.txt
The scraper can send status alerts about scrapers to Slack. This requires:
- Creating a Slack app and integrating it into a workspace
- Obtaining a Slack App API token
- Creating environment variables for the API key and target channel
See the Python slackclient docs for details on setting up a Slack app, integrating with a workspace, and obtaining an API key.
# e.g., in ~/.bash_profile or ~/.bashrc
export COVID_WORLD_SLACK_API_KEY=YOUR_API_KEY
export COVID_WORLD_SLACK_CHANNEL=channel-name
After completing the above steps, use the --alert
command-line option to send Slack alerts when scrapers are run:
# Scrape all countries and send alerts to Slack
covid-world-scraper --alert --all
This project relies on country code data from the GeoNames project.