Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly parse named columns for joins #240

Merged
merged 6 commits into from
May 29, 2024
Merged

Correctly parse named columns for joins #240

merged 6 commits into from
May 29, 2024

Conversation

dey4ss
Copy link
Member

@dey4ss dey4ss commented May 7, 2024

This PR allows a list of identifiers (a.k.a column names) when joins name columns with USING (...). This behavior is in line with the SQL standard and PostgreSQL. Furthermore, we add a flag if join conditions come from specifying them directly (foo JOIN bar ON (...)) or from named columns (foo JOIN bar USING (...)) so the database can adjust the emitted columns addording to the SQL standard (emit both columns for each predicate in ON, emit a single column for each column in USING).

For details, see #239.

Fixes #239.

@dey4ss dey4ss requested a review from Bouncner May 7, 2024 14:25
@dey4ss dey4ss changed the title Correctly named columns for joins Correctly parse named columns for joins May 7, 2024
@Bouncner
Copy link
Contributor

Bouncner commented May 7, 2024

What is your take on the “no column references” limitation?

@dey4ss
Copy link
Member Author

dey4ss commented May 8, 2024

First of all, it is in line withe the standard and pgSQL, which we try to mostly comply to. Second, it also makes sense IMO. foo JOIN bar USING (...) is comparable to foo NATURAL JOIN bar (the latter is equivalent to the first when the USING list contains all columns with the same name in foo and barUSING is a NATURAL join where you explicitly specify or limit the used columns). All columns in the list have to be present in both foo and bar, so a column reference explicitly referencing a column on only one side is not really useful.

@dey4ss
Copy link
Member Author

dey4ss commented May 13, 2024

Instead of parsing the list of columns as conjunctions of equality predicates and adding an extra boolean flag to JoinDefinition, we could also simply have the vector of column names as member, where either this vector or the join condition would be set:

struct JoinDefinition {
  JoinDefinition();
  virtual ~JoinDefinition();
  TableRef* left;
  TableRef* right;
  Expr* condition;                   // set for `foo JOIN bar ON foo.x = bar.x`
  std::vector<char*>* namedColumns;  // set for `foo JOIN bar USING (x)`
                                     // both unset for `foo NATURAL JOIN bar`
  JoinType type;
};

This would be a bit more explicit and leaves resolving the column names to a predicate to the DBMS. However, it differs from the way we handled USING before, and I am not sure what we want to do here in terms of backwards compatibility/divergence from forks.

Copy link
Contributor

@Bouncner Bouncner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much to say.

test/select_tests.cpp Outdated Show resolved Hide resolved
test/queries/queries-bad.sql Show resolved Hide resolved
test/select_tests.cpp Outdated Show resolved Hide resolved
@Bouncner
Copy link
Contributor

Feel free to zünd.

@dey4ss dey4ss merged commit 9ca19ce into master May 29, 2024
4 checks passed
@dey4ss dey4ss deleted the dey4ss/using branch May 29, 2024 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault when: [...] USING (*)
2 participants