-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set x_ref_preprocessed=True
during legacy loading (v0.10.5 patch)
#732
Set x_ref_preprocessed=True
during legacy loading (v0.10.5 patch)
#732
Conversation
The test script has been updated from #732 to actually run some sample predictions, because simply checking that loading passed missed this bug: import numpy as np
import alibi_detect
import transformers
from alibi_detect.saving import load_detector
from alibi_detect.utils.fetching.fetching import fetch_detector, fetch_state_dict, fetch_tf_model
from alibi_detect.datasets import fetch_kdd
from alibi_detect.utils.data import create_outlier_batch
from alibi_detect.utils.url import _join_url
import tensorflow as tf
import pytest
import itertools
from cloudpathlib import CloudPath
TESTS = [
# Outlier detectors
{'detector_type': 'outlier', 'detector_name': 'IForest', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'LLR', 'dataset': ['fashion_mnist', 'genome']},
{'detector_type': 'outlier', 'detector_name': 'Mahalanobis', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierAE', 'dataset': ['cifar10']},
{'detector_type': 'outlier', 'detector_name': 'OutlierAEGMM', 'dataset': ['kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierProphet', 'dataset': ['weather']},
{'detector_type': 'outlier', 'detector_name': 'OutlierSeq2Seq', 'dataset': ['ecg']},
{'detector_type': 'outlier', 'detector_name': 'OutlierVAE', 'dataset': ['adult', 'cifar10', 'kddcup']},
{'detector_type': 'outlier', 'detector_name': 'OutlierVAEGMM', 'dataset': ['kddcup']},
# Adversarial detectors
{'detector_type': 'adversarial', 'detector_name': 'model_distillation', 'dataset': ['cifar10'], 'model': ['resnet32']},
# Drift detectors (not supported by `fetch_detector`...)
{'detector_type': 'drift', 'detector_name': 'ks', 'dataset': ['cifar10', 'imdb'], 'version': ['0.6.2']},
{'detector_type': 'drift', 'detector_name': 'mmd', 'dataset': ['cifar10'], 'version': ['0.8.1']},
{'detector_type': 'drift', 'detector_name': 'tabular', 'dataset': ['income'], 'version': ['0.7.0', '0.8.1']},
]
def dict_product(dicts):
return (dict(zip(dicts, x)) for x in itertools.product(*dicts.values()))
trials = []
for test in TESTS:
for k, v in test.items():
if not isinstance(v, list):
test[k] = [v]
trials += list(dict_product(test))
n_tests = len(trials)
@pytest.fixture
def unpack_trials(request):
return trials[request.param]
@pytest.mark.parametrize("unpack_trials", list(range(n_tests)), indirect=True)
def test_fetch_detector(unpack_trials, tmp_path):
kwargs = unpack_trials
print(kwargs)
if kwargs['detector_type'] in ('outlier', 'adversarial'):
det = fetch_detector(tmp_path, **kwargs)
else:
# create url of detector
version = kwargs.get('version', '')
version = version.replace('.', '_')
url = 'gs://seldon-models/alibi-detect/'
url += 'cd/' + kwargs['detector_name'] + '/' + kwargs['dataset'] + '-' + version + '/'
# Download bucket directory
cloudpath = CloudPath(url)
cloudpath.copytree(tmp_path)
# Load detector
det = load_detector(tmp_path)
# Check loaded detector
dataset = kwargs['dataset']
detector_type = kwargs['detector_type']
det_backend = det._detector if hasattr(det, '_detector') else det
det_backend = det_backend._detector if hasattr(det_backend, '_detector') else det_backend
X = None
if dataset == 'imdb':
if detector_type == 'drift':
assert isinstance(det_backend.preprocess_fn.keywords['model'], alibi_detect.cd.tensorflow.preprocess.UAE)
assert isinstance(det_backend.preprocess_fn.keywords['tokenizer'], transformers.PreTrainedTokenizerBase)
X = ["This is one of the dumbest films, I've ever seen. It rips off nearly ever type of thriller and "
"manages to make a mess of them all.<br /><br />There's not a single good line or character in the "
"whole mess. If there was a plot, it was an afterthought and as far as acting goes, there's nothing "
"good to say so Ill say nothing. I honestly cant understand how this type of nonsense gets produced "
"and actually released, does somebody somewhere not at some stage think, 'Oh my god this really is a "
"load of shite' and call it a day. Its crap like this that has people downloading illegally, the trailer "
"looks like a completely different film, at least if you have download it, you haven't wasted your time "
"or money Don't waste your time, this is painful."]
elif dataset == 'cifar10':
if detector_type == 'drift':
assert isinstance(det_backend.preprocess_fn.keywords['model'], tf.keras.Model)
X = np.random.uniform(size=(5, 32, 32, 3)) # dummy batch of (32,32,3) images
if dataset == 'kddcup':
kddcup = fetch_kdd(percent10=True)
normal_batch = create_outlier_batch(kddcup.data, kddcup.target, n_samples=5, perc_outlier=0)
X, y = normal_batch.data.astype('float'), normal_batch.target
if X is not None:
if hasattr(det, 'infer_threshold'):
det.infer_threshold(X, threshold_perc=95) # inferring on test data just for basic test...
_ = det.predict(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x_ref_preprocessed=True
during legacy loadingx_ref_preprocessed=True
during legacy loading (v0.10.5 patch)
This PR fixes a second bug in the legacy loading behavior (first one here), used when loading detectors saved in
alibi-detect < v0.10
or withsave_detector(..., legacy=True)
in newer versions.In short, the bug was due to the change in kwarg's related to preprocessing of
x_ref
being changed inv0.10
. This was reflected in the functions that set thestate_dict
used for legacy saving (e.g.state_ksdrift
), so legacy save/load withinv0.10
works fine. However, thestate_dict
of detectors saved in older versions would of course contain old kwarg's with different meanings. The result wasx_ref
being incorrect preprocessed for a second time during the loading of detectors saved in<v0.10
.Background
In the old legacy save/load code, a
state_dict
is defined instate_ksdrift
etc:alibi-detect/alibi_detect/utils/saving.py
Lines 401 to 424 in 687cddf
At load time, in
init_cd_ksdrift
etc the nestedkwargs
dict is used to instantiate the detector, whilst selected values from theothers
dict are used to update attributes after instantiation.alibi-detect/alibi_detect/utils/saving.py
Lines 1654 to 1661 in 687cddf
In
<v0.10
thepreprocess_x_ref
kwarg (which controls whetherx_ref
is preprocessed at init) was handled in this way, to ensure that the already preprocessedx_ref
was not preprocessed again when a detector was loaded (it is set toFalse
at instantiation and then updated to its real value afterward).In
>=v0.10
this behavior is updated.preprocess_x_ref
is renamedpreprocess_at_init
, andx_ref_preprocessed
is introduced. Therefore, whetherx_ref
has been preprocessed (and thus it shouldn't be preprocessed again) is controlled byx_ref_preprocessed
, andpreprocess_at_init
only controls whether to preprocess at init or predict time. Since the preprocessing behavior is now fully controlled by the detector's kwargs (with no adhoc editing of attributes), the setting of thestate_dict
is altered instate_ksdrift
etc:alibi-detect/alibi_detect/saving/tensorflow/_saving.py
Lines 449 to 472 in ace0f51
Then, at load time, the
preprocess_x_ref
attribute no longer needs to be updated:alibi-detect/alibi_detect/saving/tensorflow/_loading.py
Lines 862 to 868 in ace0f51
Problem
The
state_ksdrift
andinit_cd_ksdrift
functions were updated in v0.10 to reflect the new preprocessing behavior, meaning legacy save/load within>v0.10
versions works fine (hence the CI passes). However, oldstate_dict
's generated with<v0.10
do not contain ax_ref_preprocessed
kwarg, and thepreprocess_at_init
(still calledpreprocess_x_ref
) kwarg is inother
instead ofkwargs
(because it was to be updated after instantiation). The result is that when loading detectors generated in old versions, they are loaded with the defaultx_ref_preprocessed=False
andpreprocess_at_init=True
, meaning the already preprocessedx_ref
is preprocessed again.Fix
The proposed fix is to first identify if the
state_dict
is created in an old version by checking thatx_ref_preprocessed
(doesn't) exist:If the
state_dict
is "old", then:x_ref_preprocessed
is set toTrue
(as is done in the updatedstate_ksdrift
etc).preprocess_x_ref
is moved fromothers
tokwargs
so that it is passed as a kwarg at instantiation instead of the attribute being updated afterward. (The name change topreprocess_at_init
doesn't matter since we use the decorator@deprecated_alias(preprocess_x_ref='preprocess_at_init')
for all relevant detectors).