Anomaly detection: round 2 #93

kathsherratt · 2021-02-17T16:48:41Z

Comments by @seabbs

I think the missing piece of the puzzle is to go back to doing some anomaly correction but in a less hardcore fashion that we did last time. Previously, we corrected all of the data (both that used for fitting and that used when plotting) and it led us a little astray (we thought we were doing well but never saw the truth data and so never knew how we were actually doing).
Adding a second anomaly cleaned data stream and using that for fitting whilst keeping the current truth data everywhere else seems like a good option.
In terms of anomaly detection something fairly light seems sensible. Perhaps just having an allowed week to week change (i.e Monday to Monday and perhaps in the order of 200%) and setting to the backwards looking 7 day average if it exceeds this?
The other critical thing we didn't have before was some awareness of how much and when we are doing this so flagging that and perhaps adding to the summary report seems like it would be really useful.

kathsherratt · 2021-02-17T17:16:05Z

I guess this involves:

Check data anomalies
- Questions:
  - checks for weekly? daily? both? anomalies
  - correct with average (mean? median) of last 7 days
- Files to update:
  - get-us-data.R
  - report.Rmd - flag states with anomalies.
Use "corrected" data in model fitting and ensembling
- Questions:
  - presumably use "corrected" data in all models?
  - are we fitting each model to both sets of data and comparing; or fitting to the "corrected" data only?
- Files to update:
  - models/rt/update-rt.R
  - models/timeseries/update-timeseries.R
  - models/deaths-conv-cases/update-conv.R
  - Ensembling already uses whichever data was used for fitting Rt model so no update needed
Use "truth" data in plotting
- evaluation/ensembles.R
- evaluation/models.R

seabbs · 2021-02-17T18:38:54Z

Nice work Kath,

Some thoughts:

I think checking for daily anomalies makes the most sense as it should hopefully be easier to detect and eyeball problems.
Mean I think - not ideal but 🤷
Fitting to just corrected data
Truth data also (obviously) in submission/report.Rmd
In finalise.R it would be nice to save a dated table of flags that can then be put into a table in report.Rmd

kathsherratt · 2021-03-03T15:53:07Z

Nothing new but just dropping in here some useful resources for manual data sense-checks (not sure where else to keep this)
https://github.com/nytimes/covid-19-data/issues?q=is%3Aissue+label%3Adata-issue+
https://github.com/CSSEGISandData/COVID-19/issues

kathsherratt · 2021-03-05T22:38:18Z

Flagging this function specifically for checking a range of methods for anomaly detection (used by Reich lab on US data):
https://github.com/reichlab/covidData/blob/master/R/identify_outliers.R
https://github.com/reichlab/covidData/blob/master/vignettes/outliers.R

Also, linking to #97 which looks to me like a near duplicate + expansion of this issue

kathsherratt · 2021-05-10T13:07:05Z

Pinning this issue. We will need to

load raw data and save this csv to a clear data-raw or similar folder
add a simple anomaly handling function which compares a value to both lead and lagged values.
save this csv into a separated data-modified or similar folder
then models should be fit to modified data, and evaluation all plotted against raw data

kathsherratt assigned seabbs and kathsherratt Feb 17, 2021

seabbs mentioned this issue Feb 18, 2021

load_observations() returns too many deaths #96

Open

kathsherratt pinned this issue May 10, 2021

kathsherratt added the priority label May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anomaly detection: round 2 #93

Anomaly detection: round 2 #93

kathsherratt commented Feb 17, 2021

kathsherratt commented Feb 17, 2021 •

edited by seabbs

Loading

seabbs commented Feb 17, 2021 •

edited

Loading

kathsherratt commented Mar 3, 2021

kathsherratt commented Mar 5, 2021 •

edited

Loading

kathsherratt commented May 10, 2021

Anomaly detection: round 2 #93

Anomaly detection: round 2 #93

Comments

kathsherratt commented Feb 17, 2021

kathsherratt commented Feb 17, 2021 • edited by seabbs Loading

seabbs commented Feb 17, 2021 • edited Loading

kathsherratt commented Mar 3, 2021

kathsherratt commented Mar 5, 2021 • edited Loading

kathsherratt commented May 10, 2021

kathsherratt commented Feb 17, 2021 •

edited by seabbs

Loading

seabbs commented Feb 17, 2021 •

edited

Loading

kathsherratt commented Mar 5, 2021 •

edited

Loading