Public Datasets for Time Series Anomaly Detection

Time Series Anomaly Detection Datasets

Here I summarized some datasets publicly available for time series anomaly detection.

1. Outlier Detection DataSets (ODDS)

ODDS webpage is here. Note that the datasets contains not only time series, but also other data types (videos, texts, and graphs).

2. Kaggle Credit Card Fraud Detection DataSet (CCFD)

Mainpage is here. The dataset contains transactions made by credit cards in September 2013 by European cardholders, yet due to privacy and security reasons, what we see is the result of a PCA transformation.

3. Yahoo Time Series Anomaly Detection Benchmark

Request access to this dataset here.

Contains 4 folders, A1, A2, A3, A4.

A1Benchmark is based on the real production traffic to some of the Yahoo! properties. The other 3 benchmarks are based on synthetic time-series. A2 and A3 Benchmarks include outliers, while the A4Benchmark includes change-point anomalies. The bechmarks based on real-data have property and geos removed. Fields in each data file are delimited with (",") characters.

4. Numenta Anomaly Benchmark (NAB)

Description of NAB can be found here.

Dataset repository is here.

5. Secure Water Treatment (SWaT) Dataset

Multivariate time series datasets collected by “iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design”. See website here to request access to the dataset and check usage requirements.

6. Water Distribution (WADI) Dataset

Also collected by “iTrust, Centre for Research in Cyber Security, Singapore University of Technology and Design”. See website here to request access to the dataset (it can actually be requested at the same time as when requesting for SWaT) and check usage requirements.

7. Server Machine Dataset (SMD)

Dataset released here as a part of the authors' repository of their KDD 2019 paper "Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network".

8. UCR Time Series Anomaly Archive

Contains over 250 datasets. The link to download the dataset is here.

The maintainers of the archive also recommend reading the following papers "The UEA multivariate time series classification archive, 2018" and "Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress" before using the dataset.

9. Soil Moisture Active Passive (SMAP) Satellite Dataset

Dataset webpage is here. Check the dataset description here.

The KDD 2018 paper "Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding" by NASA is the first paper to use this dataset. They provided download link to the dataset in their repo.
Note that the authors of "Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network" have also used the same versions of SMAP and MSL in their repo
The dataset version used by the above two papers can be downloaded using the following commands:

wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv

10. Mars Science Laboratory (MSL) Curiosity Rover Dataset

Dataset webpage is here.

The KDD 2018 paper "Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding" by NASA is the first paper to use this dataset. They provided download link to the dataset in their repo.
Note that the authors of "Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network" have also used the same versions of SMAP and MSL in their repo
The dataset version used by the above two papers can be downloaded using the following commands:

wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip && unzip data.zip && rm data.zip

cd data && wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv

11. Skoltech Anomaly Benchmark (SKAB)

Dataset repo is here.

12. Artificial Intelligence for IT Operations (AIOps) Challenge Datasets

Datasets maintained by the Netman Lab at Tsinghua University, their group's GitHub profile can be found here.

The KPI dataset from their 2018 challenge is here, and the 2020 data is here.

13. Pooled Server Metric (PSM) Dataset

This dataset was collected by eBay, and was released here in their repository of an anomaly detection model they proposed named RANSynCoders.

14. PhysioNet Open Access Databases

Check the PhysioNet Data webpage here. These datasets are all medicine-related.

One of the datasets MIT-BIH Supraventricular Arrhythmia Database was seen used in a VLDB 2022 paper TranAD: deep transformer networks for anomaly detection in multivariate time series data.

15. Datasets Related to Power Systems from IEEE Dataport

a) CYBER-PHYSICAL DATASET OF HARDWARE-IN-THE-LOOP CYBER-PHYSICAL POWER SYSTEMS TESTBED UNDER MITM ATTACKS

Dataset main page is here.

This dataset is collect by performing different Man-in-the-Middle (MiTM) attacks in the synthetic cyber-physical electric grid in RESLab Testbed at Texas AM University, US.

b) DATASET OF PORT SCANNING ATTACKS ON EMULATION TESTBED AND HARDWARE-IN-THE-LOOP TESTBED

Dataset main page is here.

The dataset is generated by performing four scenarios of port scanning attacks on a 8-substation supervisory control and data acquisition (SCADA) system at three different environments, including the minimega at Sandia National Lab (SNL), the Common Open Research Emulator (CORE) at Texas A&M University, and the hardware-in-the-loop RESLab Testbed at Texas A&M University.

c) ICS DATASET FOR SMART GRID ANOMALY DETECTION

Dataset main page is here. Dataset contains both normal traffic and communication with anomalies (cyber attacks, link failure, etc.).

16. Water Quality Dataset at GECCO 2018 Challenge

Download the dataset here.

17. Application Server Dataset (ASD)

The dataset can be found here which is within the code repository of a KDD 2021 paper.

Time Series Classification Datasets That Could Potentially Be Used for Anomaly Detection

Another common way I see people do is to use time series classification datasets for anomaly detection - you can preprocess the datasets by select one or a few minority classses and label them as anomalies.

1. UCI Machine Learning Repository Dataset - Time Series Classification

Look for time series datasets for classification tasks on the UCI repo webpage here here.

2. UEA & UCR Time Series Classification Repository

Dataset mainpage is here.

3. Industrial Control System (ICS) Cyber Attack Datasets

Dataset webpage is here.

4. Ausgrid Solar Home Electricity Dataset

The dataset main page is here. The dataset providers have published a paper Residential load and rooftop PV generation: an Australian distribution network dataset describing their dataset. There also exists an GitHub repo that analyzes this dataset's characteristics. There is a paper that uses this dataset for anomaly detection purposes titled "Anomaly Detection in Smart Meter Data for Preventing Potential Smart Grid Imbalance" here.

5. SmartMeter Energy Consumption Data in London Households

Dataset webpage is here. It contains energy consumption readings for a sample of 5,567 London Households that took part in the UK Power Networks led Low Carbon London project between November 2011 and February 2014. Readings were taken at half hourly intervals. The customers in the trial were recruited as a balanced sample representative of the Greater London population. The CSV file (Energy consumption in kWh per half hour, unique household identifier, date, and time.) is around 10GB when unzipped and contains around 167million rows.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Public Datasets for Time Series Anomaly Detection

Time Series Anomaly Detection Datasets

1. Outlier Detection DataSets (ODDS)

2. Kaggle Credit Card Fraud Detection DataSet (CCFD)

3. Yahoo Time Series Anomaly Detection Benchmark

4. Numenta Anomaly Benchmark (NAB)

5. Secure Water Treatment (SWaT) Dataset

6. Water Distribution (WADI) Dataset

7. Server Machine Dataset (SMD)

8. UCR Time Series Anomaly Archive

9. Soil Moisture Active Passive (SMAP) Satellite Dataset

10. Mars Science Laboratory (MSL) Curiosity Rover Dataset

11. Skoltech Anomaly Benchmark (SKAB)

12. Artificial Intelligence for IT Operations (AIOps) Challenge Datasets

13. Pooled Server Metric (PSM) Dataset

14. PhysioNet Open Access Databases

15. Datasets Related to Power Systems from IEEE Dataport

a) CYBER-PHYSICAL DATASET OF HARDWARE-IN-THE-LOOP CYBER-PHYSICAL POWER SYSTEMS TESTBED UNDER MITM ATTACKS

b) DATASET OF PORT SCANNING ATTACKS ON EMULATION TESTBED AND HARDWARE-IN-THE-LOOP TESTBED

c) ICS DATASET FOR SMART GRID ANOMALY DETECTION

16. Water Quality Dataset at GECCO 2018 Challenge

17. Application Server Dataset (ASD)

Time Series Classification Datasets That Could Potentially Be Used for Anomaly Detection

1. UCI Machine Learning Repository Dataset - Time Series Classification

2. UEA & UCR Time Series Classification Repository

3. Industrial Control System (ICS) Cyber Attack Datasets

4. Ausgrid Solar Home Electricity Dataset

5. SmartMeter Energy Consumption Data in London Households

About

Releases

Packages

elisejiuqizhang/TS-AD-Datasets

Folders and files

Latest commit

History

Repository files navigation

Public Datasets for Time Series Anomaly Detection

Time Series Anomaly Detection Datasets

1. Outlier Detection DataSets (ODDS)

2. Kaggle Credit Card Fraud Detection DataSet (CCFD)

3. Yahoo Time Series Anomaly Detection Benchmark

4. Numenta Anomaly Benchmark (NAB)

5. Secure Water Treatment (SWaT) Dataset

6. Water Distribution (WADI) Dataset

7. Server Machine Dataset (SMD)

8. UCR Time Series Anomaly Archive

9. Soil Moisture Active Passive (SMAP) Satellite Dataset

10. Mars Science Laboratory (MSL) Curiosity Rover Dataset

11. Skoltech Anomaly Benchmark (SKAB)

12. Artificial Intelligence for IT Operations (AIOps) Challenge Datasets

13. Pooled Server Metric (PSM) Dataset

14. PhysioNet Open Access Databases

15. Datasets Related to Power Systems from IEEE Dataport

a) CYBER-PHYSICAL DATASET OF HARDWARE-IN-THE-LOOP CYBER-PHYSICAL POWER SYSTEMS TESTBED UNDER MITM ATTACKS

b) DATASET OF PORT SCANNING ATTACKS ON EMULATION TESTBED AND HARDWARE-IN-THE-LOOP TESTBED

c) ICS DATASET FOR SMART GRID ANOMALY DETECTION

16. Water Quality Dataset at GECCO 2018 Challenge

17. Application Server Dataset (ASD)

Time Series Classification Datasets That Could Potentially Be Used for Anomaly Detection

1. UCI Machine Learning Repository Dataset - Time Series Classification

2. UEA & UCR Time Series Classification Repository

3. Industrial Control System (ICS) Cyber Attack Datasets

4. Ausgrid Solar Home Electricity Dataset

5. SmartMeter Energy Consumption Data in London Households

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages