Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RightSemi join for SortMergeJoin #13584

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

athultr1997
Copy link
Contributor

Which issue does this PR close?

Closes #13471

Rationale for this change

Adds support for RightSemi join in SortMergeJoin

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Nov 27, 2024
@athultr1997 athultr1997 marked this pull request as draft November 27, 2024 18:52
@athultr1997 athultr1997 marked this pull request as ready for review November 28, 2024 03:43
@athultr1997 athultr1997 force-pushed the athultr/right_semi_join branch 3 times, most recently from 8d8c5ca to 6976ce4 Compare November 28, 2024 03:57
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @athultr1997
I'll check the tests later this week.

Unfortunately we cannot call RIGHT SEMI/ RIGHT ANTI directly from SQL, it can be chosen by the optimizer so I'm feeling we need to focus more on unit tests

JoinType::RightSemi,
None,
)
.run_test(&[HjSmj, NljHj], true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.run_test(&[HjSmj, NljHj], true)
.run_test(&[HjSmj, NljHj], false)

JoinType::RightSemi,
Some(Box::new(col_lt_col_filter)),
)
.run_test(&[HjSmj, NljHj], true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.run_test(&[HjSmj, NljHj], true)
.run_test(&[HjSmj, NljHj], false)

@@ -508,6 +508,124 @@ select t1.* from t1 where not exists (select 1 from t2 where t2.a = t1.a and t1.
----


# RIGHTSEMI join tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure these tests makes a difference as RIGHT SEMI are not supported by SQL this type of join can be called by optimizer on physical executiion

@@ -732,7 +727,7 @@ struct JoinedRecordBatches {
pub batches: Vec<RecordBatch>,
/// Filter match mask for each row(matched/non-matched)
pub filter_mask: BooleanBuilder,
/// Row indices to glue together rows in `batches` and `filter_mask`
/// Streamed row indices to glue together rows in `batches` and `filter_mask`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Streamed row indices to glue together rows in `batches` and `filter_mask`
/// Left row indices to glue together rows in `batches` and `filter_mask`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SortMergeJoin: Add RightSemi join support
2 participants