You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm getting unexpected behavior when specifying multiple windows in calls to augment_rolling. See the example below where i create a simple toy dataframe and compare the outputs of an augment_rollling with a single window specified vs. augment_rolling with multiple windows specified. The outputs between the two are not the same even though I would expect them to be so. Example simple code and screenshots are provided below. What is actually happening here and why are they different?
I would expect the following two expressions to yield the same output on column reserve_lag_1_rolling_mean_win_4.
# Version1: with two window values specified in call to augment_rolling
df.groupby('pool')\
.augment_lags(date_column='date', value_column=['reserve', 'target'], lags=(1))\
.augment_rolling(date_column='date', value_column=['reserve_lag_1'], window=[2, 4], window_func='mean')
# Version 2: chaining two calls to augment_rolling, each with single window
df.groupby('pool')\
.augment_lags(date_column='date', value_column=['reserve', 'target'], lags=(1))\
.augment_rolling(date_column='date', value_column=['reserve_lag_1'], window=[2], window_func='mean')\
.augment_rolling(date_column='date', value_column=['reserve_lag_1'], window=[4], window_func='mean')
See results of both versions below. The columns to compare between the two frames is reserve_lag_1_rolling_mean_win_4. Version 2 aligns with the output that I would expect (and similar output if I used shift().rollling() in pandas. It seems like Version 1 doesn't respect the 2nd window (i.e. 4) and also ignores NaNs. Are there some settings that I'm missing? If so, then should it not align with pandas behavior by default?
Base dataframe:
Version 1 output:
Version 2 output:
This calls into question whether any of my expectations / assumptions on the augment_* functions were correct. Am I misunderstanding something fundamental here? The docs don't seem to point to different expected behavior.
The text was updated successfully, but these errors were encountered:
I'm getting unexpected behavior when specifying multiple windows in calls to
augment_rolling
. See the example below where i create a simple toy dataframe and compare the outputs of an augment_rollling with a single window specified vs. augment_rolling with multiple windows specified. The outputs between the two are not the same even though I would expect them to be so. Example simple code and screenshots are provided below. What is actually happening here and why are they different?I would expect the following two expressions to yield the same output on column
reserve_lag_1_rolling_mean_win_4
.See results of both versions below. The columns to compare between the two frames is
reserve_lag_1_rolling_mean_win_4
. Version 2 aligns with the output that I would expect (and similar output if I usedshift().rollling()
in pandas. It seems like Version 1 doesn't respect the 2nd window (i.e. 4) and also ignores NaNs. Are there some settings that I'm missing? If so, then should it not align with pandas behavior by default?Base dataframe:
Version 1 output:
Version 2 output:
This calls into question whether any of my expectations / assumptions on the
augment_*
functions were correct. Am I misunderstanding something fundamental here? The docs don't seem to point to different expected behavior.The text was updated successfully, but these errors were encountered: