-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature LOF outlier detector #746
Conversation
values then the score method uses the distance/kernel similarity to each of the specified `k` neighbors. | ||
In the latter case, an `aggregator` must be specified to aggregate the scores. | ||
|
||
Note that, in the multiple k case, a normalizer can be provided. If a normalizer is passed then it is fit in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be a little clearer as to what the normalizer and aggregator refer to as it's not clear here or from the kwarg descriptions. I realise this applied to KNN too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I've opened an issue here. I'll include it in a final clean up PR i think
alibi_detect/od/_lof.py
Outdated
def score(self, x: np.ndarray) -> np.ndarray: | ||
"""Score `x` instances using the detector. | ||
|
||
Computes the local outlier factor for each instance in `x`. If `k` is an array of values then the score for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth just noting here that the outlier factor is the density of each instance in x
relative to those of its neighbours in x_ref
.
alibi_detect/od/pytorch/lof.py
Outdated
return mask | ||
|
||
def _compute_K(self, x, y): | ||
"""Compute the distance/similarity matrix matrix between `x` and `y`.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we remove "/similarity" here? A similarity matrix would have entries that icnrease with similarity, whereas this is the opposite
Ran through all the code and couldn't find anything worth commenting on! Just the nitpicks on docstrings above. Nice! |
Challenge accepted 😛 |
|
||
# set metadata | ||
self.meta['detector_type'] = 'outlier' | ||
self.meta['data_type'] = 'numeric' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not isolated to this PR, but noting that we seem to be a little inconsistent across the new and old outlier detectors wrt to when data_type
is hard-coded, and when it is optionally set via a kwarg. For some, it is hardcoded to time-series
(which makes sense), for some (e.g. the old Mahalanobis
) it is set via kwarg, and for some it is hard coded to numeric
. Maybe worth opening an issue to review this more generally?
Already mentioned in #567 (comment), but highlighting here since we are setting data_type
in new detectors too...
|
||
Returns | ||
------- | ||
Outlier scores. The shape of the scores is `(n_instances,)`. The higher the score, the more anomalous the \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: In a few places outlying
is used e.g.
the l2-norm of the projected data. The higher the score, the more outlying the instance. |
whereas in the score
docstring (and for _pca
, _gmm
, _knn
, mahalanobis
) anomalous
is used. Worth picking one or the other?
assert torch.all(lof_torch(x) == torch.tensor([0, 0, 1])) | ||
|
||
|
||
@pytest.mark.skip(reason="Can't convert GaussianRBF to torch script due to torch script type constraints") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intention behind including this test if it is skipped in all cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it then decided not to include this functionality in the first set of outliers, however, it should be implemented at some point which is why I've left it in but skipped it. I can take it out if you prefer?
see #810
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few minor comments, otherwise LGTM!
What is this
Implementation of lof outlier detector. See this notebook for example. Branches off gmm-od.
TODO:
fit
/infer_threshold
changes