You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 17, 2024. It is now read-only.
Describe the bug
In the Snowflake and Postgres database dialects date columns are normalized differently than in other dialects. This causes a false positive diff when comparing those dates with date columns in other database types.
For example, consider the comparison between MySQL and Snowflake:
This is what the schemas look like (from logging with --verbose):
This query is used when pulling a date column from the MySQL database:
SELECT
cast(`id`aschar),
cast(cast( cast(`some_date`as datetime(6)) as datetime(6)) aschar)
FROM`some_schema`.`some_table`
This returns a pair like
1, 2024-03-28 00:00:00.000000
In turn, in Snowflake the query is
SELECT
cast(cast("ID"asdecimal(38, 0)) as string),
("SOME_DATE"::varchar)
FROM"SOME_SCHEMA"."SOME_TABLE"
which returns a row like
1, 2024-03-28
This is then detected as a mismatch.
This seems to stem from the normalize_timestamp function in the database dialect. In the BaseDialect, it is documented that the function should return a date formatted as YYYY-MM-DD HH:mm:SS.FFFFFF, which is accurate to the MySQL result but not to the Snowflake or Postgres dialects which only cast it directly as a string.
Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. We have completely rewritten the diffing engine in the cloud over the past few months and have solved the fundamental issues with the original algorithm used in the data-diff package. Feel free to take it for a trial or contact us at [email protected] if you have any questions.
Describe the bug
In the Snowflake and Postgres database dialects date columns are normalized differently than in other dialects. This causes a false positive diff when comparing those dates with date columns in other database types.
For example, consider the comparison between MySQL and Snowflake:
This is what the schemas look like (from logging with --verbose):
This query is used when pulling a date column from the MySQL database:
This returns a pair like
In turn, in Snowflake the query is
which returns a row like
This is then detected as a mismatch.
This seems to stem from the
normalize_timestamp
function in the database dialect. In the BaseDialect, it is documented that the function should return a date formatted asYYYY-MM-DD HH:mm:SS.FFFFFF
, which is accurate to the MySQL result but not to the Snowflake or Postgres dialects which only cast it directly as a string.data-diff/data_diff/databases/mysql.py
Lines 108 to 113 in b4e2d4b
data-diff/data_diff/databases/postgresql.py
Lines 113 to 125 in b4e2d4b
Describe the environment
Data Diff Open Source v0.11.1
Python 3.9.12
MacOSX Sonoma 14.2.1 (23C71)
The text was updated successfully, but these errors were encountered: