SeldonIO · arnaudvl · Feb 5, 2019 · Feb 1, 2019 · Feb 1, 2019 · Feb 1, 2019
diff --git a/components/outlier-detection/README.md b/components/outlier-detection/README.md
@@ -2,8 +2,7 @@
 
 ## Description
 
-[Anomaly or outlier detection](https://en.wikipedia.org/wiki/Anomaly_detection) has many applications, ranging from preventing credit card fraud to detecting computer network intrusions. Seldon Core provides a number of outlier detectors suitable for different use cases. The detectors can be run as a model which is one of the pre-defined types of [predictive units](../../docs/reference/seldon-deployment.md#proto-buffer-definition) in Seldon Core. It is a microservice that makes predictions and can receive feedback rewards. The REST and gRPC internal APIs that the model components must conform to are covered in the [internal API](../../docs/reference/internal-api.md#model) reference.
-
+[Anomaly or outlier detection](https://en.wikipedia.org/wiki/Anomaly_detection) has many applications, ranging from preventing credit card fraud to detecting computer network intrusions. Seldon Core provides a number of outlier detectors suitable for different use cases. The detectors can be run as models or transformers which are part of the pre-defined types of [predictive units](../../docs/reference/seldon-deployment.md#proto-buffer-definition) in Seldon Core. Models are microservices that make predictions and can receive feedback rewards while the input transformers add the anomaly predictions to the metadata of the underlying model. The REST and gRPC internal APIs that the model and transformer components must conform to are covered in the [internal API](../../docs/reference/internal-api.md) reference.
 
 ## Implementations
 
@@ -15,10 +14,20 @@ The following types of outlier detectors are implemented and showcased with demo
 
 The Sequence-to-Sequence LSTM algorithm can be used to detect outliers in time series data, while the other algorithms spot anomalies in tabular data. The Mahalanobis detector works online and does not need to be trained first. The other algorithms are ideally trained on a batch of normal data or data with a low fraction of outliers.
 
+## Implementing custom outlier detectors
+
+An outlier detection component can be implemented either as a model or input transformer component. If the component is defined as a model, a ```predict``` method needs to be implemented to return the detected anomalies. Optionally, a ```send_feedback``` method can return additional information about the performance of the algorithm. When the component is used as a transformer, the anomaly predictions will occur in the ```transform_input``` method which returns the unchanged input features. The anomaly predictions will then be added to the underlying model's metadata via the ```tags``` method. Both models and transformers can make use of custom metrics defined by the ```metrics``` function. 
+
+The required methods to use the outlier detection algorithms as models or transformers are implemented in the Python files with the ```Core``` prefix. The demos contain clear instructions on how to run your component as a model or transformer.
+
 ## Language specific templates
 
-A reference template for custom model components written in several languages are available:
-* [Python](../../wrappers/s2i/python/test/model-template-app/MyModel.py)
-* [R](../../wrappers/s2i/R/test/model-template-app/MyModel.R)
+Reference templates for custom model and input transformer components written in several languages are available:
+* Python
+  * [model](../../wrappers/s2i/python/test/model-template-app/MyModel.py)
+  * [transformer](../../wrappers/s2i/python/test/transformer-template-app/MyTransformer.py)
+* R
+  * [model](../../wrappers/s2i/R/test/model-template-app/MyModel.R)
+  * [transformer](../../wrappers/s2i/R/test/transformer-template-app/MyTransformer.R)
 
 Additionally, the [wrappers](../../wrappers/s2i) provide guidelines for implementing the model component in other languages.
diff --git a/components/outlier-detection/isolation-forest/CoreIsolationForest.py b/components/outlier-detection/isolation-forest/CoreIsolationForest.py
@@ -0,0 +1,117 @@
+import logging
+import numpy as np
+import pickle
+from sklearn.ensemble import IsolationForest
+
+logger = logging.getLogger(__name__)
+
+class CoreIsolationForest(object):
+    """ Outlier detection using Isolation Forests.
+
+    Parameters
+    ----------
+        threshold (float) : anomaly score threshold; scores below threshold are outliers
+
+    Functions
+    ----------
+        predict : detect and return outliers
+        transform_input : detect outliers and return input features
+        send_feedback : add target labels as part of the feedback loop
+        tags : add metadata for input transformer
+        metrics : return custom metrics
+    """
+
+    def __init__(self,threshold=0.,model_name='if',load_path='./models/'):
+
+        logger.info("Initializing model")
+        self.threshold = threshold
+        self.N = 0 # total sample count up until now
+        self.nb_outliers = 0
+
+        # load pre-trained model
+        with open(load_path + model_name + '.pickle', 'rb') as f:
+            self.clf = pickle.load(f)
+
+
+    def predict(self, X, feature_names):
+        """ Return outlier predictions.
+
+        Parameters
+        ----------
+        X : array-like
+        feature_names : array of feature names (optional)
+        """
+        logger.info("Using component as a model")
+        return self._get_preds(X)
+
+
+    def transform_input(self, X, feature_names):
+        """ Transform the input. 
+        Used when the outlier detector sits on top of another model.
+
+        Parameters
+        ----------
+        X : array-like
+        feature_names : array of feature names (optional)
+        """
+        logger.info("Using component as an outlier-detector transformer")
+        self.prediction_meta = self._get_preds(X)
+        return X
+
+
+    def _get_preds(self,X):
+        """ Detect outliers below the anomaly score threshold.  
+
+        Parameters
+        ----------
+        X : array-like
+        """
+        self.decision_val = self.clf.decision_function(X) # anomaly scores
+
+        # make prediction
+        self.prediction = (self.decision_val < self.threshold).astype(int) # scores below threshold are outliers
+
+        self.N+=self.prediction.shape[0] # update counter
+
+        return self.prediction
+
+
+    def send_feedback(self,X,feature_names,reward,truth):
+        """ Return additional data as part of the feedback loop.
+
+        Parameters
+        ----------
+            X : array of the features sent in the original predict request
+            feature_names : array of feature names. May be None if not available.
+            reward (float): the reward
+            truth : array with correct value (optional)
+        """
+        logger.info("Send feedback called")
+        return []
+
+
+    def tags(self):
+        """
+        Use predictions made within transform to add these as metadata
+        to the response. Tags will only be collected if the component is
+        used as an input-transformer.
+        """
+        try:
+            return {"outlier-predictions": self.prediction_meta.tolist()}
+        except AttributeError:
+            logger.info("No metadata about outliers")
+
+
+    def metrics(self):
+        """ Return custom metrics averaged over the prediction batch.
+        """
+        self.nb_outliers += np.sum(self.prediction)
+
+        is_outlier = {"type":"GAUGE","key":"is_outlier","value":np.mean(self.prediction)}
+        anomaly_score = {"type":"GAUGE","key":"anomaly_score","value":np.mean(self.decision_val)}
+        nb_outliers = {"type":"GAUGE","key":"nb_outliers","value":int(self.nb_outliers)}
+        fraction_outliers = {"type":"GAUGE","key":"fraction_outliers","value":int(self.nb_outliers)/self.N}
+        obs = {"type":"GAUGE","key":"observation","value":self.N}
+        threshold = {"type":"GAUGE","key":"threshold","value":self.threshold}
+
+        return [is_outlier,anomaly_score,nb_outliers,fraction_outliers,obs,threshold]
diff --git a/components/outlier-detection/isolation-forest/OutlierIsolationForest.py b/components/outlier-detection/isolation-forest/OutlierIsolationForest.py
@@ -1,68 +1,55 @@
 import numpy as np
-import pickle
-from sklearn.ensemble import IsolationForest
 
+from CoreIsolationForest import CoreIsolationForest
 from utils import flatten, performance, outlier_stats
 
-class OutlierIsolationForest(object):
+class OutlierIsolationForest(CoreIsolationForest):
     """ Outlier detection using Isolation Forests.
 
-    Arguments:
-        - threshold (float): anomaly score threshold; scores below threshold are outliers
+    Parameters
+    ----------
+        threshold (float) : anomaly score threshold; scores below threshold are outliers
 
-    Functions:
-        - predict: detect and return outliers
-        - send_feedback: add target labels as part of the feedback loop
-        - metrics: return custom metrics
+    Functions
+    ----------
+        send_feedback : add target labels as part of the feedback loop
+        metrics : return custom metrics
     """
-    def __init__(self,threshold=0.,load_path='./models/'):
+    def __init__(self,threshold=0.,model_name='if',load_path='./models/'):
 
-        self.threshold = threshold
-        self.N = 0 # total sample count up until now
-
-        # load pre-trained model
-        with open(load_path + 'model.pickle', 'rb') as f:
-            self.clf = pickle.load(f)
+        super().__init__(threshold=threshold, model_name=model_name, load_path=load_path)
 
         self._predictions = []
         self._labels = []
         self._anomaly_score = []
         self.roll_window = 100
         self.metric = [float('nan') for i in range(18)]
 
-    def predict(self,X,feature_names):
-        """ Detect outliers from mse using the threshold. 
+
+    def send_feedback(self,X,feature_names,reward,truth):
+        """ Return outlier labels as part of the feedback loop.
 
-        Arguments:
-            - X: input data
-            - feature_names
+        Parameters
+        ----------
+            X : array of the features sent in the original predict request
+            feature_names : array of feature names. May be None if not available.
+            reward (float): the reward
+            truth : array with correct value (optional)
         """
-        self.decision_val = self.clf.decision_function(X) # anomaly scores
+        _ = super().send_feedback(X,feature_names,reward,truth)
+
+        # historical reconstruction errors and predictions
         self._anomaly_score.append(self.decision_val)
         self._anomaly_score = flatten(self._anomaly_score)
-
-        # make prediction
-        self.prediction = (self.decision_val < self.threshold).astype(int) # scores below threshold are outliers
         self._predictions.append(self.prediction)
         self._predictions = flatten(self._predictions)
 
-        self.N+=self.prediction.shape[0] # update counter
-
-        return self.prediction
-
-    def send_feedback(self,X,feature_names,reward,truth):
-        """ Return outlier labels as part of the feedback loop.
-
-        Arguments:
-            - X: input data
-            - feature_names
-            - reward
-            - truth: outlier labels
-        """
+        # target labels
         self.label = truth
         self._labels.append(self.label)
         self._labels = flatten(self._labels)
 
+        # performance metrics
         scores = performance(self._labels,self._predictions,roll_window=self.roll_window)
         stats = outlier_stats(self._labels,self._predictions,roll_window=self.roll_window)
 
@@ -71,8 +58,9 @@ def send_feedback(self,X,feature_names,reward,truth):
         for c in convert: # convert from np to native python type to jsonify
             metric.append(np.asscalar(np.asarray(c)))
         self.metric = metric
-
-        return
+
+        return []
+
 
     def metrics(self):
         """ Return custom metrics.
@@ -87,8 +75,8 @@ def metrics(self):
             dec_val = float('nan')
             y_true = float('nan')
         else:
-            pred = int(self._predictions[-2])
-            dec_val = self._anomaly_score[-2]
+            pred = int(self._predictions[-1])
+            dec_val = self._anomaly_score[-1]
             y_true = int(self.label[0])
 
         is_outlier = {"type":"GAUGE","key":"is_outlier","value":pred}

diff --git a/components/outlier-detection/isolation-forest/README.md b/components/outlier-detection/isolation-forest/README.md
@@ -6,10 +6,17 @@
 
 ## Implementation
 
-The Isolation Forest is trained by running the ```train.py``` script. The ```OutlierIsolationForest``` class loads a pre-trained model and makes predictions on new data.
+The Isolation Forest is trained by running the ```train.py``` script. The ```OutlierIsolationForest``` class inherits from ```CoreIsolationForest``` which loads a pre-trained model and can make predictions on new data. 
 
-A detailed explanation of the implementation and usage of Isolation Forests as outlier detectors can be found in the [isolation_forest_doc](./isolation_forest_doc.ipynb) notebook.
+A detailed explanation of the implementation and usage of Isolation Forests as outlier detectors can be found in the [isolation forest doc](./doc.md).
 
 ## Running on Seldon
 
-An end-to-end example running an Isolation Forest outlier detector on GCP or Minikube using Seldon to identify computer network intrusions is available [here](./isolation_forest.ipynb).
+An end-to-end example running an Isolation Forest outlier detector on GCP or Minikube using Seldon to identify computer network intrusions is available [here](./isolation_forest.ipynb).
+
+Docker images to use the generic Isolation Forest outlier detector as a model or transformer can be found on Docker Hub:
+* [seldonio/outlier-if-model](https://hub.docker.com/r/seldonio/outlier-if-model)
+* [seldonio/outlier-if-transformer](https://hub.docker.com/r/seldonio/outlier-if-transformer)
+
+A model docker image specific for the demo is also available:
+* [seldonio/outlier-if-model-demo](https://hub.docker.com/r/seldonio/outlier-if-model-demo)