[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

shibuya-phys · 2023-10-11T07:12:04Z

SynapseML version

0.10.1

System information

Language version (e.g. python 3.8, scala 2.12): Python 3.10.8
Spark Version (e.g. 3.2.3): Spark 3.2.2
Spark Platform (e.g. Synapse, Databricks): Synapse

Describe the problem

I'm trying to save an Isolation Forest model after training in SynapseML. However, errors occur, and the save method does not work.

Code to reproduce issue

# building a model
from synapse.ml.isolationforest import *
from pyspark.ml.feature import VectorAssembler

# Isolation Forest parameters
contamination = 0.021
num_estimators = 100
max_samples = 100
max_features = 1.0

# Model Setup
isolationForest = (
    IsolationForest()
    .setNumEstimators(num_estimators)
    .setBootstrap(False)
    .setMaxSamples(max_samples)
    .setMaxFeatures(max_features)
    .setFeaturesCol("features")
    .setPredictionCol("predictedLabel")
    .setScoreCol("outlierScore")
    .setContamination(contamination)
    .setContaminationError(0.01 * contamination)
    .setRandomSeed(1)
)

# Training
va=VectorAssembler(inputCols=inputCols, outputCol="features")
train_data = va.transform(sdf_train)
model_isolationforest_trained = isolationForest.fit(train_data)

# Predictions
test_data = va.transform(sdf_test)
pred = model_isolationforest_trained.transform(test_data)

# Saving
model_isolationforest_trained.write().overwrite().save("path")

Other info / logs

The part of the errors caused from model_isolationforest_trained.write().overwrite().save("path") is like

output Error:
py4JJavaError: An error occured while calling 03919.save: org.apache.spark.SparkException: Job aborted. ~~

Caused by: java.lang.NoSuchMethodError: 'scala.Function1 orgz.apache.spark.sql.execution.datasources.DataSourceUtils$.createDateRebaseFucInWirte(scala.Enumeration$Value, java.lang.String)' ~~

Does the error imply that we cannot save the trained Isolation Forest model in the SynapseML? As a side note, I confirmed that the save method works with the LightGBMClassifier in the SynapseML.
I would appreciate it if someone could provide any solutions.

What component(s) does this bug affect?

What language(s) does this bug affect?

language/scala: Scala source code
language/python: Pyspark APIs
language/r: R APIs
language/csharp: .NET APIs
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/synapse: Azure Synapse integrations
integrations/azureml: Azure ML integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

github-actions · 2023-10-11T07:12:17Z

Hey @shibuya-phys 👋!
Thank you so much for reporting the issue/feature request 🚨.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.

shibuya-phys added the bug label Oct 11, 2023

github-actions bot added the triage label Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

shibuya-phys commented Oct 11, 2023

github-actions bot commented Oct 11, 2023

[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

Comments

shibuya-phys commented Oct 11, 2023

SynapseML version

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions bot commented Oct 11, 2023