Merge pull request #552 from SeldonIO/release/0.10.0

Repeating #550 (which was reverted) to merge without squashing: "Merging 0.10.0rc1 back to master so that we can include the optional dependency work (#537) in the final 0.10.0 release."
SeldonIO · Jul 7, 2022 · 6e4f5e0 · 6e4f5e0
2 parents d992cac + 9e14bc3
commit 6e4f5e0
Show file tree

Hide file tree

Showing 126 changed files with 7,876 additions and 2,231 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -80,14 +80,14 @@ jobs:
     runs-on: ubuntu-18.04
 
     container:
-      image: readthedocs/build:latest
+      image: readthedocs/build:7.0  # 7.0 to get Python 3.9
       options: --user root
 
     steps:
       - uses: actions/checkout@v2
       - name: Create a virtualenv to use for docs build
         run: |
-          python3.8 -m virtualenv $HOME/docs
+          python3.9 -m virtualenv $HOME/docs
       - name: Install dependencies
         run: |
           . $HOME/docs/bin/activate

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -8,5 +8,6 @@ repos:
     hooks:
       - id: mypy
         additional_dependencies: [
-          types-requests>=2.25.0,
+          types-requests~=2.25,
+          types-toml~=0.10
         ]
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -5,21 +5,23 @@
 # Required
 version: 2
 
+# Set the version of Python and other tools you might need
 build:
-  image: latest # Python 3.8 available on latest
+  os: ubuntu-20.04
+  tools:
+    python: "3.9"
   apt_packages:
     - pandoc
 
 # Build documentation in the docs/ directory with Sphinx
 sphinx:
-   configuration: doc/source/conf.py
+  configuration: doc/source/conf.py
 
 # Optionally build your docs in additional formats such as PDF
 formats:
-   - pdf
+  - pdf
 
 # Optionally set the version of Python and requirements required to build your docs
 python:
-   version: 3.8
-   install:
-   - requirements: requirements/docs.txt
+  install:
+    - requirements: requirements/docs.txt
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,28 @@
 # Change Log
 
+## [v0.10.0rc1](https://github.com/SeldonIO/alibi-detect/tree/v0.10.0rc1) (2022-06-01)
+[Full Changelog](https://github.com/SeldonIO/alibi-detect/compare/v0.9.1...v0.10.0rc1)
+
+### Added
+- **New feature** Drift detectors save/load functionality has been significantly reworked. All offline and online drift detectors (`tensorflow` backend only) can now be saved and loaded via `config.toml` files, allowing for more flexibility. Config files are also validated with `pydantic`. See [the documentation](https://docs.seldon.io/projects/alibi-detect/en/stable/overview/config_files.html) for more info. ([#516](https://github.com/SeldonIO/alibi-detect/pull/516)).
+- **New feature** Option to use out-of-bag predictions when using a `RandomForestClassifier` with `ClassifierDrift` ([#426](https://github.com/SeldonIO/alibi-detect/pull/426)).
+- Python 3.10 support. Note that PyTorch at the time of writing doesn't support Python 3.10 on Windows. ([#485](https://github.com/SeldonIO/alibi-detect/pull/485)).
+
+### Fixed
+- Fixed a bug in the TensorFlow trainer which occured when the data was a minibatch of size 2 ([#492](https://github.com/SeldonIO/alibi-detect/pull/492)).
+
+### Changed
+- The maximum `tensorflow` version has been bumped from 2.8 to 2.9 ([#508](https://github.com/SeldonIO/alibi-detect/pull/508)).
+
+### Development
+- Added missing CI test for `ClassifierDrift` with `sklearn` backend ([#523](https://github.com/SeldonIO/alibi-detect/pull/523)).
+- Fixed typing for `ContextMMDDrift` `pytorch` backend with `numpy`>=1.22 ([#520](https://github.com/SeldonIO/alibi-detect/pull/520)).
+- Drift detectors with backends refactored to perform distance threshold computation in `score` instead of `predict` ([#489](https://github.com/SeldonIO/alibi-detect/pull/489)).
+- Factored out PyTorch device setting to `utils.pytorch.misc.get_device()` ([#503](https://github.com/SeldonIO/alibi-detect/pull/503)).
+- Added `utils._random` submodule and `pytest-randomly` to manage determinism in CI build tests ([#496](https://github.com/SeldonIO/alibi-detect/pull/496)).
+- From this release onwards we exclude the directories `doc/` and `examples/` from the source distribution (by adding `prune` directives in `MANIFEST.in`). This results in considerably smaller file sizes for the source distribution.
+- `mypy` has been updated to `~=0.900` which requires additional development dependencies for type stubs, currently only `types-requests` and `types-toml` have been necessary to add to `requirements/dev.txt`.
+
 ## [v0.9.1](https://github.com/SeldonIO/alibi-detect/tree/v0.9.1) (2022-04-01)
 [Full Changelog](https://github.com/SeldonIO/alibi-detect/compare/v0.9.0...v0.9.1)
 

diff --git a/CITATION.cff b/CITATION.cff
@@ -17,6 +17,6 @@ authors:
 - family-names: "Samoilescu"
   given-names: "Robert"
 title: "Alibi Detect: Algorithms for outlier, adversarial and drift detection"
-version: 0.9.1
-date-released: 2022-04-01
+version: 0.10.0rc1
+date-released: 2022-06-01
 url: "https://github.com/SeldonIO/alibi-detect"
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1,2 @@
+prune doc/
+prune examples/
diff --git a/README.md b/README.md
@@ -395,8 +395,8 @@ BibTeX entry:
   title = {Alibi Detect: Algorithms for outlier, adversarial and drift detection},
   author = {Van Looveren, Arnaud and Klaise, Janis and Vacanti, Giovanni and Cobb, Oliver and Scillitoe, Ashley and Samoilescu, Robert},
   url = {https://github.com/SeldonIO/alibi-detect},
-  version = {0.9.1},
-  date = {2022-04-01},
+  version = {0.10.0rc1},
+  date = {2022-06-01},
   year = {2019}
 }
 ```
diff --git a/alibi_detect/__init__.py b/alibi_detect/__init__.py
@@ -1,4 +1,4 @@
-from . import ad, cd, models, od, utils
+from . import ad, cd, models, od, utils, saving
 from .version import __version__  # noqa F401
 
-__all__ = ["ad", "cd", "models", "od", "utils"]
+__all__ = ["ad", "cd", "models", "od", "utils", "saving"]
diff --git a/alibi_detect/base.py b/alibi_detect/base.py
@@ -2,15 +2,14 @@
 import copy
 import json
 import numpy as np
-from typing import Dict
-
-from alibi_detect.version import __version__
+from typing import Dict, Any, Optional
+from alibi_detect.version import __version__, __config_spec__
 
 DEFAULT_META = {
     "name": None,
     "detector_type": None,  # online or offline
     "data_type": None,  # tabular, image or time-series
-    "version": None,
+    "version": None
 }  # type: Dict
 
 
@@ -53,7 +52,7 @@ def concept_drift_dict():
 
 
 class BaseDetector(ABC):
-    """ Base class for outlier detection algorithms. """
+    """ Base class for outlier, adversarial and drift detection algorithms. """
 
     def __init__(self):
         self.meta = copy.deepcopy(DEFAULT_META)
@@ -94,6 +93,88 @@ def infer_threshold(self, X: np.ndarray) -> None:
         pass
 
 
+# "Large artefacts" - to save memory these are skipped in _set_config(), but added back in get_config()
+# Note: The current implementation assumes the artefact is stored as a class attribute, and as a config field under
+# the same name. Refactoring will be required if this assumption is to be broken.
+LARGE_ARTEFACTS = ['x_ref', 'c_ref', 'preprocess_fn']
+
+
+class DriftConfigMixin:
+    """
+    A mixin class containing methods related to a drift detector's configuration dictionary.
+    """
+    config: Optional[dict] = None
+
+    def get_config(self) -> dict:  # TODO - move to BaseDetector once config save/load implemented for non-drift
+        """
+        Get the detector's configuration dictionary.
+
+        Returns
+        -------
+        The detector's configuration dictionary.
+        """
+        if self.config is not None:
+            # Get config (stored in top-level self)
+            cfg = self.config
+            # Get low-level nested detector (if needed)
+            detector = self._detector if hasattr(self, '_detector') else self  # type: ignore[attr-defined]
+            detector = detector._detector if hasattr(detector, '_detector') else detector  # type: ignore[attr-defined]
+            # Add large artefacts back to config
+            for key in LARGE_ARTEFACTS:
+                if key in cfg:  # self.config is validated, therefore if a key is not in cfg, it isn't valid to insert
+                    cfg[key] = getattr(detector, key)
+            # Set x_ref_preprocessed flag
+            preprocess_at_init = getattr(detector, 'preprocess_at_init', True)  # If no preprocess_at_init, always true!
+            cfg['x_ref_preprocessed'] = preprocess_at_init and detector.preprocess_fn is not None
+            return cfg
+        else:
+            raise NotImplementedError('Getting a config (or saving via a config file) is not yet implemented for this'
+                                      'detector')
+
+    @classmethod
+    def from_config(cls, config: dict):
+        """
+        Instantiate a drift detector from a fully resolved (and validated) config dictionary.
+
+        Parameters
+        ----------
+        config
+            A config dictionary matching the schema's in :class:`~alibi_detect.saving.schemas`.
+        """
+        # Check for exisiting version_warning. meta is pop'd as don't want to pass as arg/kwarg
+        version_warning = config.pop('meta', {}).pop('version_warning', False)
+        # Init detector
+        detector = cls(**config)
+        # Add version_warning
+        detector.meta['version_warning'] = version_warning  # type: ignore[attr-defined]
+        detector.config['meta']['version_warning'] = version_warning
+        return detector
+
+    def _set_config(self, inputs):  # TODO - move to BaseDetector once config save/load implemented for non-drift
+        # Set config metadata
+        name = self.__class__.__name__
+
+        # Init config dict
+        self.config: Dict[str, Any] = {
+            'name': name,
+            'meta': {
+                'version': __version__,
+                'config_spec': __config_spec__,
+            }
+        }
+
+        # args and kwargs
+        pop_inputs = ['self', '__class__', '__len__', 'name', 'meta']
+        [inputs.pop(k, None) for k in pop_inputs]
+
+        # Overwrite any large artefacts with None to save memory. They'll be added back by get_config()
+        for key in LARGE_ARTEFACTS:
+            if key in inputs:
+                inputs[key] = None
+
+        self.config.update(inputs)
+
+
 class NumpyEncoder(json.JSONEncoder):
     def default(self, obj):
         if isinstance(