-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.pivot
does not work with index=None
even though function signature implies it is acceptable
#11592
Comments
What you're intending to get back is the transpose of group_by.agg
|
@deanm0000, the transpose does not provide header names like pivot does. Obviously can promote the first row as headers afterwards though. A perhaps simpler workaround for the issue would be to create a literal column (with a single unique value), use that as the (
df.with_columns(pl.lit(1))
.pivot(values="baz", index="literal", columns="bar", aggregate_function="sum")
.drop("literal")
) The above produces the expected output. Workarounds or alternative approaches aside, I created this issue as I believe (and the type hints in the function indicate) that |
I wasn't trying to discount the request, just trying to help. That said, what I put in earlier was on mobile without looking at how it worked. A more complete version would be
A way that doesn't use transpose and so might be more efficient...
|
This bug reminds me of this one: #10075 |
Hey @MarcoGorelli, as you look to have been in the world of And on |
Hey Just looking at this again, and I'm not sure about adding complexity to The desired functionality can be achieved with df.group_by('bar').agg(pl.sum('baz')).transpose(column_names='bar')
shape: (1, 2)
┌─────┬─────┐
│ x ┆ y │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 4 ┆ 6 │
└─────┴─────┘ which is a bit expensive, but then again so is
? Something else I feel uneasy about is this:
So, for consistency, I'd expect Either that, or to remove |
@MarcoGorelli
I think that the second point is distinct from the first. Perhaps we could allow e.g. df = pl.DataFrame(
{
"foo": ["A", "B", "C"],
"N": [1, 2, 3],
"M": [4, 5, 6],
}
)
df.pivot(index=[], columns="foo", values=None, aggregate_function=None)
shape: (1, 6)
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ N_foo_A ┆ N_foo_B ┆ N_foo_C ┆ M_foo_A ┆ M_foo_B ┆ M_foo_C │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ 1 ┆ 2 ┆ 3 ┆ 4 ┆ 5 ┆ 6 │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘ Note that this behaviour is already implemented in #15855 |
Checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Log output
Issue description
DataFrame.pivot
does not work withindex=None
.The function signature implies it is acceptable by type hinting None as an option.
However, the docstring says
potentially implying that
None
is not valid as that would be grouping by 0 keys.Expected behavior
In my opinion, there is no reason
index=None
should not be valid.It would just mean that the output of the
pivot
would always be a single row.For the example provided, the expected output would be
The docstring for the
index
parameter should also be updated to be clear that passingNone
is valid - or at least not imply that it is invalid.Installed versions
The text was updated successfully, but these errors were encountered: