Skip to content

Latest commit

 

History

History
130 lines (90 loc) · 7.65 KB

File metadata and controls

130 lines (90 loc) · 7.65 KB

Spike Detection and Change Point Detection of Product Sales E2E

ML.NET version API type Status App Type Data type Scenario ML Task Algorithms
v1.0.0 Dynamic API Up-to-date WinForms app .csv files Spike and Change Point Detection of Product Sales Anomaly Detection IID Spike Detection and IID Change point Detection

Alt Text

Overview

Product Sales Spike Detection is a simple application which builds and consumes time series anomaly detection models to detect spikes and change points in product sales.

This is an end-to-end sample which shows how you can use ML.NET and spike and change point detection in a WinForms application.

Note: This app is written in .NET Framework, so you must manually restore the nuget packages before running the app.

App Features

  • WinForms App:

    1. Prompts the user to input a dataset file for spike detection or change point detection (in this case we have provided product-sales.csv that you can use)
    2. Prompts the user to indicate if the data in the file is separated by commas or tabs
    3. Prompts the user to indicate if they want to see spikes or change points in the data
    4. Displays the data in a table format so that the user can inspect the data columns
    5. Displays the data as a time series line graph
    6. Loads the trained spike detection and change point detection models
    7. Uses the trained models to detect and display the anomalies both in a textual format, as markers in the line graph, and as highlighted rows in the data table
  • Time Series Spike Detection Console App

    1. Builds and trains a time series spike detection model using the Product Sales dataset for both spike detection and change point detection.
    2. Uses confidence level and p-value as algorithm hyperparameters.

Dataset

We have created sample dataset for Product sales. The dataset product-sales.csv can be found here

Format of Product Sales DataSet looks like below.

Month ProductSales
1-Jan 271
2-Jan 150.9
..... .....
1-Feb 199.3
... ....

The data format in Product Sales dataset is referenced from shampoo-sales dataset and the license for shampoo-sales dataset is available here.

Problem

This problem is focused on finding spikes and change points in product sales over a 3 month period, which can then be helpful in analyzing trends or abnormal behavior in sales.

To solve this problem, we will build an ML model that takes as inputs:

  • Date-Month
  • Number of product sales

and will generate an alert if/where a spike or change point in product sales is detected.

ML task - Time Series Anomaly Detection

Anomaly detection is the process of detecting outliers in the data. Anomaly detection in time series refers to detecting time stamps, or points on a given input time series, at which the time series behaves differently from what was expected. These deviations are typically indicative of some events of interest in the problem domain: a cyber-attack on user accounts, power outage, bursting RPS on a server, memory leak, etc.

An anomalous behavior can be either persistent over time or just a temporary burst. There are 2 types of anomalies in this context: spikes and change points.

Spike Detection

Spikes are attributed to sudden yet temporary bursts in the values of the input time-series. In practice, they can happen due to a variety of reasons depending on the application: outages, cyber-attacks, viral web content, etc.

Change Point Detection

Change points mark the beginning of more persistent deviations in the behavior of time-series from what was expected. In practice, these type of changes are usually triggered by some fundamental changes in the dynamics of the system. For example, in system telemetry monitoring, an introduction of a memory leak can cause a (slow) trend in the time series of memory usage after certain point in time.

Solution

To solve this problem, in your console app you build and train two ML models on existing data (product sales) to demonstrate time series anomaly detection. You then use the model in the WinForms app, where the Prediction output columns provide the Alerts where the models predicted the anomalies (spikes or change points in product sales) to be in the dataset.

The process of building and training models is the same for spike detection and change point detection; the main difference is the algorithm that you use (IidSpikeDetector vs. IidChangePointDetector).

1. Build model

Building a model in the console app includes:

  • Creating empty IDataView with just schema of dataset.

  • Creating an Estimator by applying Transformer (e.g. IidSpikeDetector or IidChangePointDetector) and setting parameters (in this case confidence level and p-value).

The initial code for Spike Detection is similar to the following:

CreateEmptyDataView();

//Create ML Context object
MLContext mlcontext = new MLContext();

//STEP 1: Create Estimator   
var estimator = mlContext.Transforms.DetectIidSpike(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95, pvalueHistoryLength: size / 4);

2. Transform model

In IID Spike detection or IID change point detection, we don't need to do training, we just need to do transformation. So the model is created using Fit() API by passing empty IDataView object.

//STEP 2:The Transformed Model.
//In IID Spike detection, we don't need to do training, we just need to do transformation. 
//As you are not training the model, there is no need to load IDataView with real data, you just need schema of data.
//So create empty data view and pass to Fit() method. 
ITransformer tansformedModel = estimator.Fit(CreateEmptyDataView());

3. Consume model & view predictions

In the WinForms app, you load and use the transformed model to predict anomalies in the data and then view the detected anomalies from the model by accessing the output column:

  • Load the data to predict from (product-sales.csv) to an IDataView and create predictions.
var mlcontext = new MLContext();

ITransformer tansformedModel;

// Load model
using (FileStream stream = new FileStream(modelPath, FileMode.Open, FileAccess.Read, FileShare.Read))
{
    tansformedModel = mlcontext.Model.Load(stream,out var modelInputSchema);
}

// Apply data transformation to create predictions
IDataView transformedData = tansformedModel.Transform(dataView);

var predictions = mlcontext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);

Each Prediction in predictions returns back a vector containing three values:

  • 0 = Alert (0 for no alert, 1 for an alert)
  • 1 = Score (value where the anomaly is detected e.g. number of sales)
  • 2 = P-value (value used to measure how likely an anomoly is to be true vs. background noise)