Skip to content

Concentration Free Outlier Factor for Anomaly Detection

License

Notifications You must be signed in to change notification settings

ghilesmeddour/cfof

Repository files navigation

CFOF (Concentration Free Outlier Factor)

🚧 Work In Progress..

Python implementation of Concentration Free Outlier Factor (CFOF) [1].

CFOF properties

  • Concentration free
  • Does not suffer of the hubness problem
  • Semi–locality
  • fast-CFOF algorithm allows to calculate reliably CFOF scores with linear cost both in the dataset size and dimensionality

Installation

To install the latest release:

$ pip install cfof

Usage

Import CFOF and FastCFOF.

>>> from cfof import CFOF, FastCFOF
>>> import numpy as np

Load data.

>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

Instantiate CFOF or FastCFOF, then call .compute(X) to calculate the scores. .compute(X) returns sc, where sc[i, l] is score of object i for ϱ_l (rhos[l]).

You can also calculate CFOF scores from a precomputed distance matrix using .compute_from_distance_matrix().

CFOF (hard-CFOF)

Use compute to compute CFOF scores directly from data.

>>> cfof_clf = CFOF(metric='euclidean', rhos=[0.5, 0.6], n_jobs=1)
>>> cfof_clf.compute(X)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ],
       [0.5       , 0.66666667],
       [0.33333333, 0.83333333],
       [0.5       , 1.        ]])

FastCFOF (soft-CFOF)

Use compute to compute CFOF scores directly from data.

>>> np.random.seed(10)
>>> X = np.random.randint(0, 100, size=(1000, 3))
>>>
>>> fast_cfof_clf = FastCFOF(metric='euclidean',
...                          rhos=[0.001, 0.005, 0.01, 0.05, 0.1],
...                          epsilon=0.1, delta=0.1, n_bins=50, n_jobs=1)
>>> fast_cfof_clf.compute(X)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

Use compute_from_distance_matrix to compute CFOF scores from a precomputed distance matrix.

>>> from sklearn.metrics import pairwise_distances
>>> distance_matrix = pairwise_distances(X, metric='euclidean')
>>> fast_cfof_clf.compute_from_distance_matrix(distance_matrix)
array([[0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.01930698, 0.06866488, 0.10481131],
       [0.00954095, 0.00954095, 0.02559548, 0.06866488, 0.10481131],
       ...,
       [0.00954095, 0.00954095, 0.01930698, 0.05963623, 0.10481131],
       [0.00954095, 0.00954095, 0.03393222, 0.15998587, 0.24420531],
       [0.00954095, 0.00954095, 0.02559548, 0.0390694 , 0.09102982]])

CFOFiSAX

This library provides a wrapper for pyCFOFiSAX [2]

>>> from cfof.cfof_isax import CFOFiSAXWrapper

Refer to pyCFOFiSAX documentation for more details.

TODOs

  • Add support for faiss (GPU).
  • Parallelize FastCFOF.
  • Add unit tests.
  • Add benchmarks.
  • Wrap pyCFOFiSAX.

References

[1] ANGIULLI, Fabrizio. CFOF: a concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 2020, vol. 14, no 1, p. 1-53.

[2] FOULON, Lucas, FENET, Serge, RIGOTTI, Christophe, et al. Scoring Message Stream Anomalies in Railway Communication Systems. In : 2019 International Conference on Data Mining Workshops (ICDMW). IEEE, 2019. p. 769-776.

About

Concentration Free Outlier Factor for Anomaly Detection

Resources

License

Stars

Watchers

Forks

Packages

No packages published