Error in documentation for SpotTheDiff detector on wine quality dataset? #780

vinyasHarish95 · 2023-04-27T21:41:33Z

Hi Seldon team, thanks for your great work on this package! I'm using it in my PhD research to understand the impact of different dataset shifts during COVID-19 on a precision public health model.

I was taking a look at the SpotTheDiff detector and the background docs say that "[like pre-processing steps] learned detectors are trained on training data which is held-out from the reference data set".

In the example on the same page, the PCA is trained on X_train and the MMDDrift detector is instantiated on X_ref.

However, in the wine quality example, the detector is instantiated on X_ref?
So I'm confused if there should be part of the whites dataset (an X_train) that should've been set aside to train the detector?

Thank you for clarifying.

The text was updated successfully, but these errors were encountered:

ojcobb · 2023-04-28T08:23:09Z

Hi @vinyasHarish95,

Thanks for pointing out this potential source of confusion.

The sentence "it is important that the learned detectors are trained on training data which is held-out from the reference data set" is intended to lend intuition as to how the learned detectors work, rather than an instruction to split data before passing it to these detectors. This is because for the learned detectors the splitting is inherent to the drift detection procedure and is therefore implemented automatically inside the detectors. By contrast data splitting is only relevant to the non-learned detectors in the special case where both a preprocessing function is specified and the preprocessing function has been fit/trained using the same source of reference data. Hence in this special case the practitioner should handle the data splitting themselves.

Hope that clears things up. We'll consider whether we can make this clearer in the docs.

jklaise added Type: Question User questions Type: Docs Anything related to documentation labels Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in documentation for SpotTheDiff detector on wine quality dataset? #780

Error in documentation for SpotTheDiff detector on wine quality dataset? #780

vinyasHarish95 commented Apr 27, 2023 •

edited

ojcobb commented Apr 28, 2023

Error in documentation for SpotTheDiff detector on wine quality dataset? #780

Error in documentation for SpotTheDiff detector on wine quality dataset? #780

Comments

vinyasHarish95 commented Apr 27, 2023 • edited

ojcobb commented Apr 28, 2023

vinyasHarish95 commented Apr 27, 2023 •

edited