KNNCAD with low probationary_period fails #3

andrewm4894 · 2020-10-22T09:50:27Z

I think I found an issue if you set the probationary_period for KNNCAD to be too low.

This was tripping me up a little so thought worth raising in here. I'm not quite sure what the solution would be - maybe some sort of reasonable default for probationary_period in KNNCAD could help others at least avoid this in future.

Or maybe its just fine and people should not set such a low probationary_period but it was one of the first things i did so maybe others might too :)

Reproducible example:

# Import modules.
from sklearn.utils import shuffle
from pysad.evaluation import AUROCMetric
from pysad.models import xStream, RobustRandomCutForest, KNNCAD
from pysad.utils import ArrayStreamer
from pysad.transform.postprocessing import RunningAveragePostprocessor
from pysad.transform.preprocessing import InstanceUnitNormScaler
from pysad.utils import Data
from tqdm import tqdm
import numpy as np

# This example demonstrates the usage of the most modules in PySAD framework.
if __name__ == "__main__":
    np.random.seed(61)  # Fix random seed.

    # Get data to stream.
    data = Data("data")
    X_all, y_all = data.get_data("arrhythmia.mat")
    X_all, y_all = shuffle(X_all, y_all)

    iterator = ArrayStreamer(shuffle=False)  # Init streamer to simulate streaming data.

    model = KNNCAD(probationary_period=10)
    #model = RobustRandomCutForest()
    #model = xStream()  # Init xStream anomaly detection model.
    preprocessor = InstanceUnitNormScaler()  # Init normalizer.
    postprocessor = RunningAveragePostprocessor(window_size=5)  # Init running average postprocessor.
    auroc = AUROCMetric()  # Init area under receiver-operating- characteristics curve metric.

    for X, y in tqdm(iterator.iter(X_all[100:], y_all[100:])):  # Stream data.
        X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.

        score = model.fit_score_partial(X)  # Fit model to and score the instance.
        score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.

        auroc.update(y, score)  # Update AUROC metric.

    # Output resulting AUROCS metric.
    print("\nAUROC: ", auroc.get())

Gives error:

/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.utils.testing module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.utils. Anything that cannot be imported from sklearn.utils is now part of the private API.
  warnings.warn(message, FutureWarning)
0it [00:00, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-c8fd98afee64> in <module>()
     31         X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.
     32 
---> 33         score = model.fit_score_partial(X)  # Fit model to and score the instance.
     34         score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.
     35 

1 frames
/usr/local/lib/python3.6/dist-packages/pysad/models/knn_cad.py in fit_partial(self, X, y)
     73                 self.training.append(self.calibration.pop(0))
     74 
---> 75             self.scores.pop(0)
     76             self.calibration.append(new_item)
     77             self.scores.append(new_score)

IndexError: pop from empty list

If i set the probationary_period to 25 i see a slightly different error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-fb6b7ffc5fde> in <module>()
     31         X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.
     32 
---> 33         score = model.fit_score_partial(X)  # Fit model to and score the instance.
     34         score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.
     35 

4 frames
<__array_function__ internals> in partition(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in partition(a, kth, axis, kind, order)
    744     else:
    745         a = asanyarray(a).copy(order="K")
--> 746     a.partition(kth, axis=axis, kind=kind, order=order)
    747     return a
    748 

ValueError: kth(=28) out of bounds (6)

Then if I set probationary_period=50 it works.

So feels like is some sort of edge case I may be hitting when probationary_period is low.

I'm happy to work on a PR if some sort of easy fix we can make or even just want to set a default that might avoid people doing what I did :)

The text was updated successfully, but these errors were encountered:

selimfirat self-assigned this Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KNNCAD with low probationary_period fails #3

KNNCAD with low probationary_period fails #3

andrewm4894 commented Oct 22, 2020

KNNCAD with low probationary_period fails #3

KNNCAD with low probationary_period fails #3

Comments

andrewm4894 commented Oct 22, 2020