-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pivot and melt #644
Comments
Thanks for starting this! I would split the feature into:
BQ can do this dynamically now, but only as the last operation of a query: https://towardsdatascience.com/pivot-in-bigquery-4eefde28b3be dbt can also do this dynamically: https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/pivot.sql, by issuing two queries. I really think we could be very good at the first. We have to do the work to think through how to build it, but it can be done with some very simple rust or a templating language (not quite s-strings atm tho!). Zooming out — while in this instance dbt can do both static & dynamic, its macros are often used for things known statically. And while I love dbt, this is a feature born of necessity rather than desire, and the sort of area we can start improving upon. |
Uff, the dbt pivot syntax seems very cumbersome. We can surely improve that. I very much agree that "dynamic column names" are much harder. Am I correct in assuming that pivot operation requires dynamic columns, while melt produces only a static columns (independent of table contents)? This would mean that we can start with melt (wide to long). |
I think:
Is that how you're thinking about it? |
Adding links to our recent discussion on Discord and the DuckDB developments. Discord discussion: https://discord.com/channels/936728116712316989/945832228011798528/1077845402793017384 DuckDB pivot: duckdb/duckdb#6387 |
I hope this is applicable to the Pivot discussion. I am new to PRQL and I was thinking, "I wonder if I can get it to do something similar to Pivot() ..." let header_rows = from_text """
col1,col2
U30,U50
"""
let under_thirty = (
from employees
filter age < 30
aggregate [
ct30 = count
]
)
let under_fifty = (
from employees
filter (age | in 30..49)
aggregate [
ct50 = count
]
)
let aggregate_onerow = (
from under_thirty
join side:full under_fifty []
)
from aggregate_onerow`
This compiled to the following SQL (which is a "poor man's pivot"). I could not do anything with the 'header_rows' variable / CTE.
`WITH under_thirty AS (
SELECT
COUNT(*) AS ct30
FROM
employees
WHERE
age < 30
),
under_fifty AS (
SELECT
COUNT(*) AS ct50
FROM
employees
WHERE
age BETWEEN 30 AND 49
),
aggregate_onerow AS (
SELECT
under_thirty.ct30,
under_fifty.ct50
FROM
under_thirty FULL
JOIN under_fifty ON true
)
SELECT
ct30,
ct50
FROM
aggregate_onerow
-- Generated by PRQL compiler version:0.8.1 |
Currently, there is native functionality in duckdb sql for Pivot as well. |
This was discussed at https://news.ycombinator.com/item?id=39717268. We should ensure that this is at least possible with an s-string. |
https://duckdb.org/2024/09/27/sql-only-extensions Since it's SQL only this should be useful for us for implementing pivot functionality. |
By the way, polars stopped using the word melt and now call it unpivot. |
In Postgresql, crosstab |
One thing SQL lacks is a idiom of converting:
.. into ..
.. and back.
There is many names for this, let's list them here:
The problem has some unknowns which are treated differently by the functions above:
Related comment #300 (comment)
Could be implemented as: https://stackoverflow.com/questions/69263964/how-to-pivot-in-postgresql
Would solve: https://old.reddit.com/r/SQL/comments/viumd0/bigquery_how_to_aggregate_data/
Currently I don't have capacity to tackle all these, so I'm just opening a tracking issue.
The text was updated successfully, but these errors were encountered: