Based on the data from [2], compute several statistics for different areas (Province, Regioni) in Italy.
NOTE: These statistics are based on 1) uncertain data, 2) possibly wrong! Always check and compare the official sources if in doubt.
For each day that has available data, the program outputs for each provincia/regione:
avg_N
: average of the number of new COVID cases in the lastW
dayslastW_pop
: total number of new cases in the last W days per100000
inhabitantslastW_area
: total number of new cases in the last W days per square Kmreff
: reproduction indextotal
: total number of cases
The default value of W
is 7 days, different values could be chosen with the wsize
parameter.
For each provincia/regione, the program also computes the expected value day (see [4] for details), which gives a qualitative indication of how likely an outbreak is under control. The further away from the expected value day, the likelier an outbreak is under control.
Let N(t)
be the number of new cases at day t
.
avg_N(t)
on a window of size W
is computed as:
avg_N(t)
=(N(t) + N(t-1) + ... + N(t-W)) / W
Averaging the number of new cases over a window of W
days "smoothes" the uncertainty of the data. (Note that no "shifting"/"re-centering" on the day with statistically the highest number of submitted new cases is performed.)
The reproduction index reff
is computed on the basis of avg_N
values.
reff(t)
=avg_N(t) / avg_N(t-W)
With the default value of W = 7
days, the assumption is that a new infection is visible in the statistics after 7
days.
Let t
be a progressive day from the beginning of the available data (i.e., t = 0
is day "one"), the expected value day is computed as (in pseudo latex notation):
exp_day = \sum_{t} (t * N(t)) / total_cases
The expected value day "looks" (hopefully back) from the most recent available data.
mavg.py --help
The program uses input data from [1], [2], and [3]. The data from [1] and [3] has been consolidated into data/regioni.csv
and data/province.csv
.
Check the OFILE and IFILE dictionaries for the default locations of the input and output json/CSV files.
Two json files (one for the province, one for the regioni) reporting the statistics for each regione/provincia, and day.
avg_province.json
{
"1": {
"code": 1,
"exp": [
246.4967163445791,
"2020-10-27T18:00:00",
-108
],
"name": "Torino",
"stat": {
...
"2020-03-11T17:00:00": {
"avg_N": 21.428571428571427,
"lastW_area": 0.02198781581832787,
"lastW_pop": 6.648561893900477,
"reff": 2.5,
"tot": 159
},
...
- [1]: DEMO istat.it (November 2019 data) ISTAT Italian Population. (Source data for the
tavola_bilancio_mensile_2019_*
files.) - [2]: COVID-19 Data
- [3]: Area of Italian Province. Area
- [4]: Details on the statistics computed by "covid19 Germany districts visualization" links