-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong output when tables are used and columns have same name #4752
Comments
Our semantics should be a bit clearer here, but what would you want the output to be? Currently PRQL returns:
...based on
What would we expect? ("Expected SQL Output" is useful there!) |
Actually, is this a problem with DuckDB? Or with PRQL? |
What should the SQL be? |
I think the problem is with the The following (putting the '*' last seems to achieve the correct results.
Additionally, this PRQL works properly
with sql
I think the select * should not be there as you know the columns you need if you crawl the DAG. At minimum the |
This is a bug. (I hate wildcard columns) |
We discussed this on the call. It's not an easy problem from the compiler perspective since it doesn't know whether it's creating duplicate columns, we'll need to think more. For a workaround — replacing |
Minimal example: from invoices
derive {
total = total * 0,
}
into data
from data
select {
total,
} ... produces: WITH data AS (
SELECT
*,
total * 0 AS total
FROM
invoices
)
SELECT
total
FROM
data; The problem is that CTE
|
A correct SQL output to the above input would be: WITH data AS (
SELECT
* EXCLUDE (total), -- exclude total from wildcard
total as _expr_0, -- include total under a different name
total * 0 AS total -- this column will now be the only column named total
FROM
invoices
)
SELECT
total
FROM
data; Unfortunately this only works for SQL dialects that support a way of excluding columns from a wildcard. With say PostgreSQL, this is not possible to express because:
|
I think for everyone's benefit, it would be good to lean hard into "please never use wildcards in anything that matters". It really isn't that hard to use strict column naming but everyone needs to be reminded to do it. If most things work in wildcard mode that is convenient for quick tests and development, but really with PRQL and what it offers, you are going to suffer if you forget to remove wildcards. Like "strict mode" or something might be reasonable branding. I would even perhaps enjoy the ability to have some kind of flavour or option I could add to disallow wildcards in the query as this would remind me and throw if I break the rule. |
That's a good idea. A strict mode for PRQL where it will not infer table types and deal with "unknown columns", but instead require that the table types be specified by the user. This can already be done in current versions of PRQL: module default_db {
let invoices <[{
invoice_id = int,
customer_id = int,
invoice_date = date,
total = float,
}]>
}
from default_db.invoices
derive {
total = total * 0,
}
into data
from data
select {
total,
} ... this will produce correct SQL, that will return total of all 0, as it should. |
What happened?
Deriving columns with same name as existing results in confusing or wrong output.
PRQL input
SQL output
Expected SQL output
No response
MVCE confirmation
Anything else?
No response
The text was updated successfully, but these errors were encountered: