Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outlier service type #428

Merged
merged 6 commits into from
Feb 5, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions components/outlier-detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@

## Description

[Anomaly or outlier detection](https://en.wikipedia.org/wiki/Anomaly_detection) has many applications, ranging from preventing credit card fraud to detecting computer network intrusions. Seldon Core provides a number of outlier detectors suitable for different use cases. The detectors can be run as a model which is one of the pre-defined types of [predictive units](../../docs/reference/seldon-deployment.md#proto-buffer-definition) in Seldon Core. It is a microservice that makes predictions and can receive feedback rewards. The REST and gRPC internal APIs that the model components must conform to are covered in the [internal API](../../docs/reference/internal-api.md#model) reference.

[Anomaly or outlier detection](https://en.wikipedia.org/wiki/Anomaly_detection) has many applications, ranging from preventing credit card fraud to detecting computer network intrusions. Seldon Core provides a number of outlier detectors suitable for different use cases. The detectors can be run as models or transformers which are part of the pre-defined types of [predictive units](../../docs/reference/seldon-deployment.md#proto-buffer-definition) in Seldon Core. Models are microservices that make predictions and can receive feedback rewards while the input transformers add the anomaly predictions to the metadata of the underlying model. The REST and gRPC internal APIs that the model and transformer components must conform to are covered in the [internal API](../../docs/reference/internal-api.md) reference.

## Implementations

Expand All @@ -15,10 +14,20 @@ The following types of outlier detectors are implemented and showcased with demo

The Sequence-to-Sequence LSTM algorithm can be used to detect outliers in time series data, while the other algorithms spot anomalies in tabular data. The Mahalanobis detector works online and does not need to be trained first. The other algorithms are ideally trained on a batch of normal data or data with a low fraction of outliers.

## Implementing custom outlier detectors

An outlier detection component can be implemented either as a model or input transformer component. If the component is defined as a model, a ```predict``` method needs to be implemented to return the detected anomalies. Optionally, a ```send_feedback``` method can return additional information about the performance of the algorithm. When the component is used as a transformer, the anomaly predictions will occur in the ```transform_input``` method which returns the unchanged input features. The anomaly predictions will then be added to the underlying model's metadata via the ```tags``` method. Both models and transformers can make use of custom metrics defined by the ```metrics``` function.

The required methods to use the outlier detection algorithms as models or transformers are implemented in the Python files with the ```Core``` prefix. The demos contain clear instructions on how to run your component as a model or transformer.

## Language specific templates

A reference template for custom model components written in several languages are available:
* [Python](../../wrappers/s2i/python/test/model-template-app/MyModel.py)
* [R](../../wrappers/s2i/R/test/model-template-app/MyModel.R)
Reference templates for custom model and input transformer components written in several languages are available:
* Python
* [model](../../wrappers/s2i/python/test/model-template-app/MyModel.py)
* [transformer](../../wrappers/s2i/python/test/transformer-template-app/MyTransformer.py)
* R
* [model](../../wrappers/s2i/R/test/model-template-app/MyModel.R)
* [transformer](../../wrappers/s2i/R/test/transformer-template-app/MyTransformer.R)

Additionally, the [wrappers](../../wrappers/s2i) provide guidelines for implementing the model component in other languages.
117 changes: 117 additions & 0 deletions components/outlier-detection/isolation-forest/CoreIsolationForest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
import logging
import numpy as np
import pickle
from sklearn.ensemble import IsolationForest

logger = logging.getLogger(__name__)

class CoreIsolationForest(object):
""" Outlier detection using Isolation Forests.

Parameters
----------
threshold (float) : anomaly score threshold; scores below threshold are outliers

Functions
----------
predict : detect and return outliers
transform_input : detect outliers and return input features
send_feedback : add target labels as part of the feedback loop
tags : add metadata for input transformer
metrics : return custom metrics
"""

def __init__(self,threshold=0.,model_name='if',load_path='./models/'):

logger.info("Initializing model")
self.threshold = threshold
self.N = 0 # total sample count up until now
self.nb_outliers = 0

# load pre-trained model
with open(load_path + model_name + '.pickle', 'rb') as f:
self.clf = pickle.load(f)


def predict(self, X, feature_names):
""" Return outlier predictions.

Parameters
----------
X : array-like
feature_names : array of feature names (optional)
"""
logger.info("Using component as a model")
return self._get_preds(X)


def transform_input(self, X, feature_names):
""" Transform the input.
Used when the outlier detector sits on top of another model.

Parameters
----------
X : array-like
feature_names : array of feature names (optional)
"""
logger.info("Using component as an outlier-detector transformer")
self.prediction_meta = self._get_preds(X)
return X


def _get_preds(self,X):
""" Detect outliers below the anomaly score threshold.

Parameters
----------
X : array-like
"""
self.decision_val = self.clf.decision_function(X) # anomaly scores

# make prediction
self.prediction = (self.decision_val < self.threshold).astype(int) # scores below threshold are outliers

self.N+=self.prediction.shape[0] # update counter

return self.prediction


def send_feedback(self,X,feature_names,reward,truth):
""" Return additional data as part of the feedback loop.

Parameters
----------
X : array of the features sent in the original predict request
feature_names : array of feature names. May be None if not available.
reward (float): the reward
truth : array with correct value (optional)
"""
logger.info("Send feedback called")
return []


def tags(self):
"""
Use predictions made within transform to add these as metadata
to the response. Tags will only be collected if the component is
used as an input-transformer.
"""
try:
return {"outlier-predictions": self.prediction_meta.tolist()}
except AttributeError:
logger.info("No metadata about outliers")


def metrics(self):
""" Return custom metrics averaged over the prediction batch.
"""
self.nb_outliers += np.sum(self.prediction)

is_outlier = {"type":"GAUGE","key":"is_outlier","value":np.mean(self.prediction)}
anomaly_score = {"type":"GAUGE","key":"anomaly_score","value":np.mean(self.decision_val)}
nb_outliers = {"type":"GAUGE","key":"nb_outliers","value":int(self.nb_outliers)}
fraction_outliers = {"type":"GAUGE","key":"fraction_outliers","value":int(self.nb_outliers)/self.N}
obs = {"type":"GAUGE","key":"observation","value":self.N}
threshold = {"type":"GAUGE","key":"threshold","value":self.threshold}

return [is_outlier,anomaly_score,nb_outliers,fraction_outliers,obs,threshold]
Original file line number Diff line number Diff line change
@@ -1,68 +1,55 @@
import numpy as np
import pickle
from sklearn.ensemble import IsolationForest

from CoreIsolationForest import CoreIsolationForest
from utils import flatten, performance, outlier_stats

class OutlierIsolationForest(object):
class OutlierIsolationForest(CoreIsolationForest):
""" Outlier detection using Isolation Forests.

Arguments:
- threshold (float): anomaly score threshold; scores below threshold are outliers
Parameters
----------
threshold (float) : anomaly score threshold; scores below threshold are outliers

Functions:
- predict: detect and return outliers
- send_feedback: add target labels as part of the feedback loop
- metrics: return custom metrics
Functions
----------
send_feedback : add target labels as part of the feedback loop
metrics : return custom metrics
"""
def __init__(self,threshold=0.,load_path='./models/'):
def __init__(self,threshold=0.,model_name='if',load_path='./models/'):

self.threshold = threshold
self.N = 0 # total sample count up until now

# load pre-trained model
with open(load_path + 'model.pickle', 'rb') as f:
self.clf = pickle.load(f)
super().__init__(threshold=threshold, model_name=model_name, load_path=load_path)

self._predictions = []
self._labels = []
self._anomaly_score = []
self.roll_window = 100
self.metric = [float('nan') for i in range(18)]

def predict(self,X,feature_names):
""" Detect outliers from mse using the threshold.

def send_feedback(self,X,feature_names,reward,truth):
""" Return outlier labels as part of the feedback loop.

Arguments:
- X: input data
- feature_names
Parameters
----------
X : array of the features sent in the original predict request
feature_names : array of feature names. May be None if not available.
reward (float): the reward
truth : array with correct value (optional)
"""
self.decision_val = self.clf.decision_function(X) # anomaly scores
_ = super().send_feedback(X,feature_names,reward,truth)

# historical reconstruction errors and predictions
self._anomaly_score.append(self.decision_val)
self._anomaly_score = flatten(self._anomaly_score)

# make prediction
self.prediction = (self.decision_val < self.threshold).astype(int) # scores below threshold are outliers
self._predictions.append(self.prediction)
self._predictions = flatten(self._predictions)

self.N+=self.prediction.shape[0] # update counter

return self.prediction

def send_feedback(self,X,feature_names,reward,truth):
""" Return outlier labels as part of the feedback loop.

Arguments:
- X: input data
- feature_names
- reward
- truth: outlier labels
"""
# target labels
self.label = truth
self._labels.append(self.label)
self._labels = flatten(self._labels)

# performance metrics
scores = performance(self._labels,self._predictions,roll_window=self.roll_window)
stats = outlier_stats(self._labels,self._predictions,roll_window=self.roll_window)

Expand All @@ -71,8 +58,9 @@ def send_feedback(self,X,feature_names,reward,truth):
for c in convert: # convert from np to native python type to jsonify
metric.append(np.asscalar(np.asarray(c)))
self.metric = metric

return

return []


def metrics(self):
""" Return custom metrics.
Expand All @@ -87,8 +75,8 @@ def metrics(self):
dec_val = float('nan')
y_true = float('nan')
else:
pred = int(self._predictions[-2])
dec_val = self._anomaly_score[-2]
pred = int(self._predictions[-1])
dec_val = self._anomaly_score[-1]
y_true = int(self.label[0])

is_outlier = {"type":"GAUGE","key":"is_outlier","value":pred}
Expand Down
13 changes: 10 additions & 3 deletions components/outlier-detection/isolation-forest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,17 @@

## Implementation

The Isolation Forest is trained by running the ```train.py``` script. The ```OutlierIsolationForest``` class loads a pre-trained model and makes predictions on new data.
The Isolation Forest is trained by running the ```train.py``` script. The ```OutlierIsolationForest``` class inherits from ```CoreIsolationForest``` which loads a pre-trained model and can make predictions on new data.

A detailed explanation of the implementation and usage of Isolation Forests as outlier detectors can be found in the [isolation_forest_doc](./isolation_forest_doc.ipynb) notebook.
A detailed explanation of the implementation and usage of Isolation Forests as outlier detectors can be found in the [isolation forest doc](./doc.md).

## Running on Seldon

An end-to-end example running an Isolation Forest outlier detector on GCP or Minikube using Seldon to identify computer network intrusions is available [here](./isolation_forest.ipynb).
An end-to-end example running an Isolation Forest outlier detector on GCP or Minikube using Seldon to identify computer network intrusions is available [here](./isolation_forest.ipynb).

Docker images to use the generic Isolation Forest outlier detector as a model or transformer can be found on Docker Hub:
* [seldonio/outlier-if-model](https://hub.docker.com/r/seldonio/outlier-if-model)
* [seldonio/outlier-if-transformer](https://hub.docker.com/r/seldonio/outlier-if-transformer)

A model docker image specific for the demo is also available:
* [seldonio/outlier-if-model-demo](https://hub.docker.com/r/seldonio/outlier-if-model-demo)
Loading