Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Deletes not being output in to the table materialization #747

Closed
devcshort opened this issue Oct 19, 2023 · 3 comments
Closed

Deletes not being output in to the table materialization #747

devcshort opened this issue Oct 19, 2023 · 3 comments
Labels
bug Something isn't working stale_immune Immunity to stale bot

Comments

@devcshort
Copy link

devcshort commented Oct 19, 2023

I am running diff_tables through a Python script and materializing all rows to a table within my DB. This seems to work great for figuring out our updated columns and rows, however deletes are not being materialized.

Below is the code I'm using. I wanted to check to see if an ID I'm expecting to get an output for in my tables (which isn't there) would show up in the output that the Python script gives me, which it did. I would expect that anything that shows up in the output for diff_tables within my script would also be materialized in to the table that data_diff uses for materialization. From what I can tell, it is not outputting deletes in materialization which throws wrench in the pipeline I'm currently working on.

try:
    for d in data_diff.diff_tables(
        source_table,
        target_table,
        extra_columns=columns,
        key_columns=key_columns,
        materialize_to_table=f"NORSE_DIFF.{SNOWFLAKE_CONN_INFO['schema']}.{table_name}",
        materialize_all_rows=True,
    ):
        if d[1][0] == "c91e4af2-4585-5cbb-924b-cbeb12b7919e":
            print(d[1][0])
except Exception as e:
    print(e)

I'm currently using
[email protected]
MacOS Apple Silicon

This is running within a Dagster environment as well.

@devcshort devcshort added the bug Something isn't working label Oct 19, 2023
@dlawin dlawin added stale_immune Immunity to stale bot and removed triage labels Oct 31, 2023
@dlawin
Copy link
Contributor

dlawin commented Oct 31, 2023

Seems like there may be an issue with the all_rows query here

They are passed into _materialize_diff here

@a-s-sarkar-9299
Copy link

@devcshort can you explain how to materialize data-diff results to a redshift table for open source version for comparison with redshift db itself on a high level ? I am intend to do the same using dbt , redshift in local dbt core

@glebmezh
Copy link
Contributor

Hi @devcshort,

I'm sorry for the delay in following up on this. Thank you for raising this issue and for looking into potential solutions!

We made a hard decision to sunset the data-diff package and won't provide further development or support.

If that's of interest, over the past few months, we have rewritten the diffing engine in Datafold Cloud and solved many issues that existed in this package's diffing algorithm.

-Gleb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working stale_immune Immunity to stale bot
Projects
None yet
Development

No branches or pull requests

4 participants