Skip to content

SoerenMind/epimodel

 
 

Repository files navigation

Epidemics data processing and modelling toolkit

A data library and a toolkit for modelling COVID-19 epidemics.

Main concepts

  • Region database (continents, countries, provinces, GLEAM basins) - codes, names, basic stats, tree structure (TODO).
  • All data, including the region database, is stored in epimodel-covid-data repo for asynchronous updates.
  • Each region has an ISO-based code, all datasets are organized by those codes (as a row index).
  • Built on Pandas with some helpers, using mostly CSVs and HDF5.
  • Algorithms and imports assuming common dataframe structure (with Code and optionally Date row index).
  • All dates are UTC timestamps, stored in ISO format with TZ.

Install

  • Get Poetry
  • Clone this repository.
  • Install the dependencies and this lib poetry install (creates a virtual env by default).
  • Clone the epimodel-covid-data repository. For convenience, I recommend cloninig it inside the epimodel repo directory as data.
## Clone the repositories (or use their https://... withou github login)
git clone [email protected]:epidemics/epimodel.git
cd epimodel
git clone [email protected]:epidemics/epimodel-covid-data.git data

## Install packages
poetry install  # Best run it outside virtualenv - poetry will create its own
# Alternatively, you can also install PyMC3 or Pyro, and jupyter (in both cases):
poetry install -E pymc3
poetry install -E pyro

## Or, if using conda, install (a likely list): pandas pymc3 unidecode jupyter ...

poetry shell # One way to enter the virtualenv (if not active already)
poetry run jupyter notebook  # For example

Basic usage

from epimodel import RegionDataset, read_csv

# Read regions
rds = RegionDataset.load('data/regions.csv')
# Find by name or by code
cz = rds['CZ']
cz = rds.find_one_by_name('Czech Republic')
# Use attribute access on Region
print(cz.Name)
# TODO: attributes for tree-structure access

# Load John Hopkins CSSE dataset with our helper (creates indexes etc.)
csse = read_csv('data/johns-hopkins.csv')
print(csse.loc[('CZ', "2020-03-28")])

Development

  • Use Poetry for dependency management.
  • We enforce black formatting (with the default style).
  • Use pytest for testing, add tests for your code!
  • Use pull requests for both this and the data repository.

About

Epidemological modelling toolkit

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 98.4%
  • Python 1.6%