Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to save a trained Isolation Forest model in SynapseML #2094

Open
4 of 19 tasks
shibuya-phys opened this issue Oct 11, 2023 · 1 comment
Open
4 of 19 tasks

Comments

@shibuya-phys
Copy link

SynapseML version

0.10.1

System information

  • Language version (e.g. python 3.8, scala 2.12): Python 3.10.8
  • Spark Version (e.g. 3.2.3): Spark 3.2.2
  • Spark Platform (e.g. Synapse, Databricks): Synapse

Describe the problem

I'm trying to save an Isolation Forest model after training in SynapseML. However, errors occur, and the save method does not work.

Code to reproduce issue

# building a model
from synapse.ml.isolationforest import *
from pyspark.ml.feature import VectorAssembler

# Isolation Forest parameters
contamination = 0.021
num_estimators = 100
max_samples = 100
max_features = 1.0

# Model Setup
isolationForest = (
    IsolationForest()
    .setNumEstimators(num_estimators)
    .setBootstrap(False)
    .setMaxSamples(max_samples)
    .setMaxFeatures(max_features)
    .setFeaturesCol("features")
    .setPredictionCol("predictedLabel")
    .setScoreCol("outlierScore")
    .setContamination(contamination)
    .setContaminationError(0.01 * contamination)
    .setRandomSeed(1)
)

# Training
va=VectorAssembler(inputCols=inputCols, outputCol="features")
train_data = va.transform(sdf_train)
model_isolationforest_trained = isolationForest.fit(train_data)

# Predictions
test_data = va.transform(sdf_test)
pred = model_isolationforest_trained.transform(test_data)

# Saving
model_isolationforest_trained.write().overwrite().save("path")

Other info / logs

The part of the errors caused from model_isolationforest_trained.write().overwrite().save("path") is like

output Error:
py4JJavaError: An error occured while calling 03919.save: org.apache.spark.SparkException: Job aborted. ~~

Caused by: java.lang.NoSuchMethodError: 'scala.Function1 orgz.apache.spark.sql.execution.datasources.DataSourceUtils$.createDateRebaseFucInWirte(scala.Enumeration$Value, java.lang.String)' ~~

Does the error imply that we cannot save the trained Isolation Forest model in the SynapseML? As a side note, I confirmed that the save method works with the LightGBMClassifier in the SynapseML.
I would appreciate it if someone could provide any solutions.

What component(s) does this bug affect?

  • area/cognitive: Cognitive project
  • area/core: Core project
  • area/deep-learning: DeepLearning project
  • area/lightgbm: Lightgbm project
  • area/opencv: Opencv project
  • area/vw: VW project
  • area/website: Website
  • area/build: Project build system
  • area/notebooks: Samples under notebooks folder
  • area/docker: Docker usage
  • area/models: models related issue

What language(s) does this bug affect?

  • language/scala: Scala source code
  • language/python: Pyspark APIs
  • language/r: R APIs
  • language/csharp: .NET APIs
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/synapse: Azure Synapse integrations
  • integrations/azureml: Azure ML integrations
  • integrations/databricks: Databricks integrations
@github-actions
Copy link

Hey @shibuya-phys 👋!
Thank you so much for reporting the issue/feature request 🚨.
Someone from SynapseML Team will be looking to triage this issue soon.
We appreciate your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant