Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KNNCAD with low probationary_period fails #3

Open
andrewm4894 opened this issue Oct 22, 2020 · 0 comments
Open

KNNCAD with low probationary_period fails #3

andrewm4894 opened this issue Oct 22, 2020 · 0 comments
Assignees

Comments

@andrewm4894
Copy link

I think I found an issue if you set the probationary_period for KNNCAD to be too low.

This was tripping me up a little so thought worth raising in here. I'm not quite sure what the solution would be - maybe some sort of reasonable default for probationary_period in KNNCAD could help others at least avoid this in future.

Or maybe its just fine and people should not set such a low probationary_period but it was one of the first things i did so maybe others might too :)

Reproducible example:

# Import modules.
from sklearn.utils import shuffle
from pysad.evaluation import AUROCMetric
from pysad.models import xStream, RobustRandomCutForest, KNNCAD
from pysad.utils import ArrayStreamer
from pysad.transform.postprocessing import RunningAveragePostprocessor
from pysad.transform.preprocessing import InstanceUnitNormScaler
from pysad.utils import Data
from tqdm import tqdm
import numpy as np

# This example demonstrates the usage of the most modules in PySAD framework.
if __name__ == "__main__":
    np.random.seed(61)  # Fix random seed.

    # Get data to stream.
    data = Data("data")
    X_all, y_all = data.get_data("arrhythmia.mat")
    X_all, y_all = shuffle(X_all, y_all)

    iterator = ArrayStreamer(shuffle=False)  # Init streamer to simulate streaming data.

    model = KNNCAD(probationary_period=10)
    #model = RobustRandomCutForest()
    #model = xStream()  # Init xStream anomaly detection model.
    preprocessor = InstanceUnitNormScaler()  # Init normalizer.
    postprocessor = RunningAveragePostprocessor(window_size=5)  # Init running average postprocessor.
    auroc = AUROCMetric()  # Init area under receiver-operating- characteristics curve metric.

    for X, y in tqdm(iterator.iter(X_all[100:], y_all[100:])):  # Stream data.
        X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.

        score = model.fit_score_partial(X)  # Fit model to and score the instance.
        score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.

        auroc.update(y, score)  # Update AUROC metric.

    # Output resulting AUROCS metric.
    print("\nAUROC: ", auroc.get())

Gives error:

/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:143: FutureWarning: The sklearn.utils.testing module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.utils. Anything that cannot be imported from sklearn.utils is now part of the private API.
  warnings.warn(message, FutureWarning)
0it [00:00, ?it/s]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-c8fd98afee64> in <module>()
     31         X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.
     32 
---> 33         score = model.fit_score_partial(X)  # Fit model to and score the instance.
     34         score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.
     35 

1 frames
/usr/local/lib/python3.6/dist-packages/pysad/models/knn_cad.py in fit_partial(self, X, y)
     73                 self.training.append(self.calibration.pop(0))
     74 
---> 75             self.scores.pop(0)
     76             self.calibration.append(new_item)
     77             self.scores.append(new_score)

IndexError: pop from empty list

If i set the probationary_period to 25 i see a slightly different error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-fb6b7ffc5fde> in <module>()
     31         X = preprocessor.fit_transform_partial(X)  # Fit preprocessor to and transform the instance.
     32 
---> 33         score = model.fit_score_partial(X)  # Fit model to and score the instance.
     34         score = postprocessor.fit_transform_partial(score)  # Apply running averaging to the score.
     35 

4 frames
<__array_function__ internals> in partition(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in partition(a, kth, axis, kind, order)
    744     else:
    745         a = asanyarray(a).copy(order="K")
--> 746     a.partition(kth, axis=axis, kind=kind, order=order)
    747     return a
    748 

ValueError: kth(=28) out of bounds (6)

Then if I set probationary_period=50 it works.

So feels like is some sort of edge case I may be hitting when probationary_period is low.

I'm happy to work on a PR if some sort of easy fix we can make or even just want to set a default that might avoid people doing what I did :)

@selimfirat selimfirat self-assigned this Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants