Adding a parameter for aggregating rows when calling Frame.set_index on a columns with duplicated labels #319
ForeverWintr
started this conversation in
Ideas
Replies: 1 comment
-
Thank you for this suggestion. I began to explore an implementation in >>> labels = (
... (1, 1, 'a'),
... (1, 2, 'b'),
... (1, 3, 'c'),
... (2, 1, 'd'),
... (2, 2, 'e'),
... (2, 3, 'd'),
... (3, 1, 'b'),
... (3, 2, 'h'),
... (3, 3, 'b'),
... )
>>> f = sf.Frame.from_records(labels, columns=('x', 'y', 'z'))
>>> f
<Frame>
<Index> x y z <<U1>
<Index>
0 1 1 a
1 1 2 b
2 1 3 c
3 2 1 d
4 2 2 e
5 2 3 d
6 3 1 b
7 3 2 h
8 3 3 b
<int64> <int64> <int64> <<U1>
>>> f.pivot(index_fields='z')
<Frame>
<Index: values> x y <<U1>
<Index: z>
a 1 1
b 7 6
c 1 3
d 4 4
e 2 2
h 3 2
<<U1> <int64> <int64> This does not, however, yet permit applying a different function per column. Multiple functions are permitted, but the default usage of those functions is to apply all of them to each columns:
If per column function application is needed, it seems like adding a parameter to |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Several times I've found myself writing utility functions to reindex Frames/Series with an optional callable to handle duplicates. For example, this function reindexes a series based on the values of a second series. It is passed a duplicate aggregation function and calls it with any duplicates it finds.
In other circumstances
duplicate_aggregation
might be a function that checks that all values in the group are identical and arbitrarily returns one of them.This pattern is useful enough that I think it would be beneficial to include in Static Frame. For example,
Frame.set_index
could takeduplicate_aggregation
. What do you think?Beta Was this translation helpful? Give feedback.
All reactions