Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support augment operations on multi-index columns #304

Open
liangjh opened this issue Nov 26, 2024 · 0 comments
Open

Support augment operations on multi-index columns #304

liangjh opened this issue Nov 26, 2024 · 0 comments

Comments

@liangjh
Copy link

liangjh commented Nov 26, 2024

Can pytimetk be augmented to support multi-index columns?

The pytimetk augment_* API appears to makes the assumption that columns are single indexes i.e. multi-index not supported.

In the augment_* functions, the date and value columns parameters make these assumptions, but if i had a situation where i had a multi-index, i would need to collapse the multi-index down into a single dimension before i can utilize pytimetk.

Here's an example. The pool forms a 2nd dimension on the column multi-index. this allows the rows to be keyed by dates only along the rows (i.e. longitudinally).

df = pd.DataFrame({
    'date': pd.date_range(start='2020-01-01', periods=10, freq='D'),
    'pool': ['A','A','A','A','A','B','B','B','B','B'],
    'target': [1,-1,0,-1,1,1,0,-1,-1,1],
    'reserve': [5,20,10,1,4,30,15,18,2,9]
})

df_tdp = df.set_index(['pool', 'date']).unstack('pool')

image

In pandas, this is also the ideal format to perform dataset-wide window operations. It preserves the dimensionality of the columns as well. This way we can stack / unstack the original columns into rows if we want to preserve the inbound dimensionality.

df_tdp.shift(1).rolling(window=2).mean()

image

pd.concat([
    df_tdp.stack('pool'),
    df_tdp.shift(1).rolling(window=2).mean().stack('pool', dropna=False)
], axis=1)

image

Assuming a singular dimension is a limiting factor for more advanced cases beyond more simple datasets / use cases.
Would like to hear any thoughts on this - most datasets I imagine are multi-dimensional / have multiple attributes. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant