-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Missing Partition Filters on renamed Columns after Join #2942
Comments
Can someone assign this to me. I can work on this. |
This issue is observed on and Delta Lake version: 2.4.0 right? |
Hi @NuthanReddy, Thank you for your reply. I also changed to my local file system, but I still have problems with either of the mentioned version combinations. Is the screenshot you are displaying from the first or the second explain()? The join in the parquet format will explicitly lead to the pushdown of the filter. Using the delta format, this is not the case. In the following screenshot, you can see the change in behavior of reading files with delta vs. parquet. |
Bug
Which Delta project/connector is this regarding?
Describe the problem
When you load two delta tables in PySpark, each partitioned by columns with different names, partition filters applied after joining the two dataframes will only affect the dataset where the column name has not been changed if the datatype is decimal in one of the DataFrames. In contrast, when these tables are read directly in the parquet format, the partition filter is applied to both tables.
Steps to reproduce
Observed results
Expected results
The PartitionFilters are applied on both tables.
Further details
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: