Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behavior with multiple windows on augment_rolling #303

Open
liangjh opened this issue Nov 26, 2024 · 0 comments
Open

Inconsistent behavior with multiple windows on augment_rolling #303

liangjh opened this issue Nov 26, 2024 · 0 comments

Comments

@liangjh
Copy link

liangjh commented Nov 26, 2024

I'm getting unexpected behavior when specifying multiple windows in calls to augment_rolling. See the example below where i create a simple toy dataframe and compare the outputs of an augment_rollling with a single window specified vs. augment_rolling with multiple windows specified. The outputs between the two are not the same even though I would expect them to be so. Example simple code and screenshots are provided below. What is actually happening here and why are they different?

df = pd.DataFrame({
    'date': pd.date_range(start='2020-01-01', periods=10, freq='D'),
    'pool': ['A','A','A','A','A','B','B','B','B','B'],
    'target': [1,-1,0,-1,1,1,0,-1,-1,1],
    'reserve': [5,20,10,1,4,30,15,18,2,9]
})

I would expect the following two expressions to yield the same output on column reserve_lag_1_rolling_mean_win_4.

# Version1: with two window values specified in call to augment_rolling
df.groupby('pool')\
    .augment_lags(date_column='date', value_column=['reserve', 'target'], lags=(1))\
    .augment_rolling(date_column='date', value_column=['reserve_lag_1'], window=[2, 4], window_func='mean')


# Version 2: chaining two calls to augment_rolling, each with single window
df.groupby('pool')\
    .augment_lags(date_column='date', value_column=['reserve', 'target'], lags=(1))\
    .augment_rolling(date_column='date', value_column=['reserve_lag_1'], window=[2], window_func='mean')\
    .augment_rolling(date_column='date', value_column=['reserve_lag_1'], window=[4], window_func='mean')

See results of both versions below. The columns to compare between the two frames is reserve_lag_1_rolling_mean_win_4. Version 2 aligns with the output that I would expect (and similar output if I used shift().rollling() in pandas. It seems like Version 1 doesn't respect the 2nd window (i.e. 4) and also ignores NaNs. Are there some settings that I'm missing? If so, then should it not align with pandas behavior by default?

Base dataframe:
image

Version 1 output:
image

Version 2 output:
image

This calls into question whether any of my expectations / assumptions on the augment_* functions were correct. Am I misunderstanding something fundamental here? The docs don't seem to point to different expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant