Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175

Open
AtanasGruev opened this issue Nov 14, 2024 · 1 comment

Comments

@AtanasGruev
Copy link

Describe the bug
Simulating live model deployment of the standard multivariate model DefaultDetector (i.e. DetectorEnsemble of VAE and RRCF) by means of the TSADEvaluator leads to periodic re-training. Initially, TSADEvaluator's default_retrain_kwargs() method ensures that train_config for training the DetectorEnsemble is an instance of DetectorEnsembleTrainConfig. However, after passing down re-training from DetectorEnsemble to the individual models, no care is taken to ensure that the train_config for the DefaultDetector will be an instance of the same class. Instead, train_config for training the DetectorEnsemble that is the DefaultDetector is of type dict which leads to the bug reported.

Most likely this is related to TSADEvaluator mismatch between full_train_kwargs and full_retrain_kwargs, as the lines

full_train_kwargs = self.default_train_kwargs()
full_train_kwargs.update(train_kwargs)

do not ensure that
train_result = self._train_model(train_vals, **full_train_kwargs)
utilizes the correct train_config.

To Reproduce
Bug has been identified by going over the tutorial on "Multivariate Time Series Anomaly Detection" for Merlion v2.0.2, section "Model Inference and Quantitative Evaluation" (see https://opensource.salesforce.com/Merlion/v2.0.2/tutorials/anomaly/2_AnomalyMultivariate.html#Model-Inference-and-Quantitative-Evaluation). When performing "Sliding Window Evaluation" with TSADEvaluator, the ensemble fails at re-training the DefaultDetector model due to the bug reported.

Expected behavior
Successful re-training of the DefaultDetector model as part of DetectorEnsemble models when using TSADEvaluator.

Screenshots
A screenshot of the resulting error stack trace is attached.
DetectorEnsemble Sliding Window Evaluation

Desktop

  • OS: Ubuntu 24.04 LTS
  • Merlion Version: 2.0.2
  • Python Version: 3.9.18
  • openjdk-11-jdk installed as per docs.

Additional context
At the re-train trigger

if t >= t_next and not cur_train.is_empty() and not cur_test.is_empty():
the following call sequence occurs:

  1. merlion.evaluate.anomaly.TSADEvaluator's get_predict() invokes merlion.evaluate.base.EvaluatorBase's get_predict(). The latter contains the re-training logic.
  2. When re-training is initiated, self.model is an instance of merlion.models.ensemble.anomaly.DetectorEnsemble. Consequently, EvaluatorBase's _train_model() invokes DetectorEnsemble's train().
  3. merlion.models.ensemble.anomaly.DetectorEnsemble inherits from merlion.models.ensemble.base.EnsembleBase and merlion.models.anomaly.base.DetectorBase. Only the latter has a train() method. Therefore, DetectorEnsemble's train() actually calls DetectorBase's train().
  4. Using call_with_accepted_kwargs, DetectorBase's train() invokes DetectorEnsemble's _train().
  5. After executing
    train_cfgs = train_config.per_model_train_configs
    train_cfgs becomes List[dict].
  6. TSADEvaluator's get_predict() is invoked at the first iteration of
    for i, (model, cfg, pr_cfg) in enumerate(zip(self.models, train_cfgs, pr_cfgs)):
    try:
    train_kwargs = dict(train_config=cfg, anomaly_labels=anomaly_labels, post_rule_train_config=pr_cfg)
    train_scores, valid_scores = TSADEvaluator(model=model, config=eval_cfg).get_predict(
    train_vals=train, test_vals=valid, train_kwargs=train_kwargs, post_process=True
    )
    which is responsible for re-training the first ensemble model which is an instance of merlion.models.defaults.DefaultDetector. At this moment, train_kwargs['train_config'] is of type dict. Effectively, TSADEvaluator's get_predict() invokes EvaluatorBase's get_predict().
  7. EvaluatorBases get_predict() invokes EvaluatorBase's _train_model(). The latter invokes merlion.models.defaults.DefaultDetector's train(). At this moment, train_config is of type dict.
  8. self.model is set to be a DetectorEnsemble of VAE and RRCF, and DefaultDetector's train() invokes LayeredDetector's train(). merlion.models.layers.LayeredDetector inherits from merlion.models.layers.LayeredModel and merlion.models.anomaly.base.DetectorBase. Only the latter has a train() method. Therefore, LayeredDetector's train() invokes DetectorBase's train(). At this moment, train_config is of type dict.
  9. Using call_with_accepted_kwargs, DetectorBase's train() invokes DetectorEnsemble's _train().
  10. train_config is required to be an instance of DetectorEnsembleTrainConfig. We see that this is not the case. The error occurs.
@AtanasGruev
Copy link
Author

UPDATE: On a fresh repo clone, running all cells in /Merlion/examples/anomaly/2_AnomalyMultivariate.ipynb:

  1. Using pip install Merlion/ followed by pip install Merlion/[all]: Problem occurs. (Merlion Version: 2.0.2)
  2. Using pip install salesforce-merlion followed by pip install salesforce-merlion/[all]: Problem occurs. (Merlion Version: 2.0.4)

This also highlights a discrepancy between the PyPi latest version and this repo's current latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant