You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe. duckdb-engine provides a sqlalchemy driver for duckdb, which I wanted to test with great expectations. I'm able to get a data source setup easily, but when I test an expectation I always run into the same error:
{
"success": false,
"expectation_config": {
"type": "expect_column_max_to_be_between",
"kwargs": {
"column": "passenger_count",
"min_value": 1.0,
"max_value": 6.0,
"batch_id": "duckdb-test"
},
"meta": {}
},
"result": {},
"meta": {},
"exception_info": {
"('table.row_count', '02f890ec1f22e1cb927a8ed609303838', ())": {
"exception_traceback": "Traceback (most recent call last):\n File \"/path/to/great_expectations/execution_engine/execution_engine.py\", line 545, in _process_direct_and_bundled_metric_computation_configurations\n self.resolve_metric_bundle(metric_fn_bundle=metric_fn_bundle_configurations)\n File \"/path/to/great_expectations/execution_engine/sqlalchemy_execution_engine.py\", line 1042, in resolve_metric_bundle\n f\"\"\"SqlAlchemyExecutionEngine computed {len(res[0])} metrics on domain_id \\\n ~~~^^^\nIndexError: list index out of range\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/path/to/great_expectations/validator/validation_graph.py\", line 276, in _resolve\n self._execution_engine.resolve_metrics(\n File \"/path/to/great_expectations/execution_engine/execution_engine.py\", line 279, in resolve_metrics\n return self._process_direct_and_bundled_metric_computation_configurations(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/path/to/great_expectations/execution_engine/execution_engine.py\", line 549, in _process_direct_and_bundled_metric_computation_configurations\n raise gx_exceptions.MetricResolutionError(\ngreat_expectations.exceptions.exceptions.MetricResolutionError: list index out of range\n",
"exception_message": "list index out of range",
"raised_exception": true
}
}
}
I did a little bit of debugging and I think what's happening is that the db connection is being closed before calling fetchall on a query, so it ends up returning an empty list. I think if duckdb was included in _PERSISTED_CONNECTION_DIALECTS in great_expectations/execution_engine/sqlalchemy_execution_engine.py that this problem would go away. There may be other issues with the duckdb dialect, but it would at least be a start to being able to test it out.
Describe the solution you'd like
Add a new duckdb dialect to great_expectations/execution_engine/sqlalchemy_dialect.py and include this in _PERSISTED_CONNECTION_DIALECTS. This seems pretty straightforward, and I'd be happy to put a PR together to do this, but I'm not sure if there are any other considerations needed to add a new dialect like this.
Describe alternatives you've considered
An alternative option could be to add just add the OTHER dialect to _PERSISTED_CONNECTION_DIALECTS. This seems reasonable to me and might be best practice to default to keeping the engine alive while still interacting with a cursor.
Additional context
Steps to recreate the issue:
Create a duckdb file with a test table
importduckdbimportpandasaspdtest_df=pd.DataFrame({"test_col": [0, 1, 2]})
withduckdb.connect("test.db") ascon:
con.sql(
"CREATE TABLE test_table AS "f"SELECT * FROM test_df;"
)
con.table("test_table").show()
Hi @zschira, thank you for reaching out! I've forwarded this to the relevant team to confirm if we're currently accepting contributions for this type of work. I appreciate the effort you've already put into investigating this, and I'll get back to you with an update as soon as possible
Unfortunately, we’ve decided that we’re not yet ready to enable DuckDB as a data source. This decision is primarily due to the ongoing maintenance it would require on our end (e.g., documentation, PR reviews, testing).
We truly appreciate the effort you’ve put into exploring this, and I wish we could support your use case at this time. While it’s not possible right now, we do plan to accept contributions for data sources again in the future so I encourage you to check back in later if it's something you'd still want later next year. Thank you for your understanding.
Is your feature request related to a problem? Please describe.
duckdb-engine provides a sqlalchemy driver for duckdb, which I wanted to test with great expectations. I'm able to get a data source setup easily, but when I test an expectation I always run into the same error:
I did a little bit of debugging and I think what's happening is that the db connection is being closed before calling
fetchall
on a query, so it ends up returning an empty list. I think if duckdb was included in_PERSISTED_CONNECTION_DIALECTS
ingreat_expectations/execution_engine/sqlalchemy_execution_engine.py
that this problem would go away. There may be other issues with the duckdb dialect, but it would at least be a start to being able to test it out.Describe the solution you'd like
Add a new duckdb dialect to
great_expectations/execution_engine/sqlalchemy_dialect.py
and include this in_PERSISTED_CONNECTION_DIALECTS
. This seems pretty straightforward, and I'd be happy to put a PR together to do this, but I'm not sure if there are any other considerations needed to add a new dialect like this.Describe alternatives you've considered
An alternative option could be to add just add the
OTHER
dialect to_PERSISTED_CONNECTION_DIALECTS
. This seems reasonable to me and might be best practice to default to keeping the engine alive while still interacting with a cursor.Additional context
Steps to recreate the issue:
The text was updated successfully, but these errors were encountered: