[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

AtanasGruev · 2024-11-14T10:01:17Z

Describe the bug
Simulating live model deployment of the standard multivariate model DefaultDetector (i.e. DetectorEnsemble of VAE and RRCF) by means of the TSADEvaluator leads to periodic re-training. Initially, TSADEvaluator's default_retrain_kwargs() method ensures that train_config for training the DetectorEnsemble is an instance of DetectorEnsembleTrainConfig. However, after passing down re-training from DetectorEnsemble to the individual models, no care is taken to ensure that the train_config for the DefaultDetector will be an instance of the same class. Instead, train_config for training the DetectorEnsemble that is the DefaultDetector is of type dict which leads to the bug reported.

Most likely this is related to TSADEvaluator mismatch between full_train_kwargs and full_retrain_kwargs, as the lines

Merlion/merlion/evaluate/base.py

Lines 191 to 192 in 085ef8a

    
           full_train_kwargs = self.default_train_kwargs() 
        
           full_train_kwargs.update(train_kwargs)

do not ensure that

Merlion/merlion/evaluate/base.py

Line 202 in 085ef8a

train_result = self._train_model(train_vals, **full_train_kwargs)

utilizes the correct train_config.

To Reproduce
Bug has been identified by going over the tutorial on "Multivariate Time Series Anomaly Detection" for Merlion v2.0.2, section "Model Inference and Quantitative Evaluation" (see https://opensource.salesforce.com/Merlion/v2.0.2/tutorials/anomaly/2_AnomalyMultivariate.html#Model-Inference-and-Quantitative-Evaluation). When performing "Sliding Window Evaluation" with TSADEvaluator, the ensemble fails at re-training the DefaultDetector model due to the bug reported.

Expected behavior
Successful re-training of the DefaultDetector model as part of DetectorEnsemble models when using TSADEvaluator.

Screenshots
A screenshot of the resulting error stack trace is attached.

Desktop

OS: Ubuntu 24.04 LTS
Merlion Version: 2.0.2
Python Version: 3.9.18
openjdk-11-jdk installed as per docs.

Additional context
At the re-train trigger

Merlion/merlion/evaluate/base.py

Line 230 in 085ef8a

if t >= t_next and not cur_train.is_empty() and not cur_test.is_empty():

the following call sequence occurs:

merlion.evaluate.anomaly.TSADEvaluator's get_predict() invokes merlion.evaluate.base.EvaluatorBase's get_predict(). The latter contains the re-training logic.
When re-training is initiated, self.model is an instance of merlion.models.ensemble.anomaly.DetectorEnsemble. Consequently, EvaluatorBase's _train_model() invokes DetectorEnsemble's train().
merlion.models.ensemble.anomaly.DetectorEnsemble inherits from merlion.models.ensemble.base.EnsembleBase and merlion.models.anomaly.base.DetectorBase. Only the latter has a train() method. Therefore, DetectorEnsemble's train() actually calls DetectorBase's train().
Using call_with_accepted_kwargs, DetectorBase's train() invokes DetectorEnsemble's _train().
After executing

Merlion/merlion/models/ensemble/anomaly.py

Line 139 in 085ef8a

train_cfgs = train_config.per_model_train_configs

train_cfgs becomes List[dict].

TSADEvaluator's get_predict() is invoked at the first iteration of

Merlion/merlion/models/ensemble/anomaly.py

Lines 159 to 164 in 085ef8a

    
           for i, (model, cfg, pr_cfg) in enumerate(zip(self.models, train_cfgs, pr_cfgs)): 
        
               try: 
        
                   train_kwargs = dict(train_config=cfg, anomaly_labels=anomaly_labels, post_rule_train_config=pr_cfg) 
        
                   train_scores, valid_scores = TSADEvaluator(model=model, config=eval_cfg).get_predict( 
        
                       train_vals=train, test_vals=valid, train_kwargs=train_kwargs, post_process=True 
        
                   )

which is responsible for re-training the first ensemble model which is an instance of merlion.models.defaults.DefaultDetector. At this moment, train_kwargs['train_config'] is of type dict. Effectively, TSADEvaluator's get_predict() invokes EvaluatorBase's get_predict().

EvaluatorBases get_predict() invokes EvaluatorBase's _train_model(). The latter invokes merlion.models.defaults.DefaultDetector's train(). At this moment, train_config is of type dict.
self.model is set to be a DetectorEnsemble of VAE and RRCF, and DefaultDetector's train() invokes LayeredDetector's train(). merlion.models.layers.LayeredDetector inherits from merlion.models.layers.LayeredModel and merlion.models.anomaly.base.DetectorBase. Only the latter has a train() method. Therefore, LayeredDetector's train() invokes DetectorBase's train(). At this moment, train_config is of type dict.
Using call_with_accepted_kwargs, DetectorBase's train() invokes DetectorEnsemble's _train().
train_config is required to be an instance of DetectorEnsembleTrainConfig. We see that this is not the case. The error occurs.

The text was updated successfully, but these errors were encountered:

AtanasGruev · 2024-11-21T10:08:17Z

UPDATE: On a fresh repo clone, running all cells in /Merlion/examples/anomaly/2_AnomalyMultivariate.ipynb:

Using pip install Merlion/ followed by pip install Merlion/[all]: Problem occurs. (Merlion Version: 2.0.2)
Using pip install salesforce-merlion followed by pip install salesforce-merlion/[all]: Problem occurs. (Merlion Version: 2.0.4)

This also highlights a discrepancy between the PyPi latest version and this repo's current latest version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

AtanasGruev commented Nov 14, 2024

AtanasGruev commented Nov 21, 2024

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

Comments

AtanasGruev commented Nov 14, 2024

AtanasGruev commented Nov 21, 2024