Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Option to Disable Persistence in Spark Expectations for Serverless Compatibility #10705

Open
vlad-luca-colibri opened this issue Nov 26, 2024 · 1 comment
Labels
feature-request feature request

Comments

@vlad-luca-colibri
Copy link

Describe the bug
I am encountering an issue while working with serverless compute in Databricks, which does not support any form of persistence. Specifically, when using the following call:

result: ExpectationValidationResult = df.expect_column_values_to_not_be_null(
    column=column, meta=self.meta, catch_exceptions=False
)

The method expect_column_values_to_not_be_null internally calls:

col_df = self.spark_df.select(F.col(eval_col))  # pyspark.sql.DataFrame

# A couple of tests indicate that caching here helps performance
col_df.persist()

This is located in .../great_expectations/dataset/sparkdf_dataset.py.
Since col_df.persist() is not supported on serverless compute, it results in a failure.

To Reproduce

  1. Start a serverless compute environment in Databricks.
  2. Run the expect_column_values_to_not_be_null method on a Spark DataFrame.

Expected behavior
I would expect there to be an option (e.g., a parameter) to enable or disable persistence, allowing compatibility with environments that do not support persistence.

Environment (please complete the following information):

  • Operating System: MacOS
  • Great Expectations Version: 0.18.18
  • Data Source: Spark
  • Cloud environment: DataBricks
@github-project-automation github-project-automation bot moved this from To Do to Completed in GX Core Issues Board Nov 26, 2024
@vlad-luca-colibri vlad-luca-colibri closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2024
@github-project-automation github-project-automation bot moved this from Completed to Fixing in GX Core Issues Board Nov 26, 2024
@adeola-ak adeola-ak moved this from Fixing to To Do in GX Core Issues Board Nov 27, 2024
@adeola-ak adeola-ak added the feature-request feature request label Dec 2, 2024
@adeola-ak
Copy link
Contributor

Hi there! Thank you for submitting this feature request. I've noted it and will pass it along to the appropriate team. Please check back on the issue for any updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request feature request
Projects
Status: To Do
Development

No branches or pull requests

2 participants