Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Window Functions from Polars #589

Open
guarilha opened this issue May 7, 2023 · 6 comments
Open

Missing Window Functions from Polars #589

guarilha opened this issue May 7, 2023 · 6 comments

Comments

@guarilha
Copy link
Contributor

guarilha commented May 7, 2023

Polars has the following "rolling" (explorer calls it window) functions:

Although some of the rolling computations are already available in Explorer, the absence of a rolling_apply equivalent makes it less convenient to calculate certain statistics and models.

As a result, users are forced to resort to workarounds that are far from ideal. For example, one could calculate the 21-day rolling standard deviation by using additional columns and a combination of existing functions. However, for users familiar with Pandas, this approach can feel unusual.

Is there any plan to support rolling_apply in Explorer, or am I overlooking something?

@josevalim
Copy link
Member

I don't think we can support rolling_apply because it is not possible to call Erlang from C/Rust without using message passing. As far as I see, the python version linked is fully implemented in C.

Can you provide a more concrete example that you are trying to address and how you are addressing it? Perhaps we can provide higher level conveniences without having it named rolling_apply itself?

@guarilha
Copy link
Contributor Author

guarilha commented May 8, 2023

Sure!

I have a dataframe with daily returns from stocks and i need the 21-day rolling window of volatility (std dev) and correlation among these series.

My initial solution was similar to this:

require Explorer.Series, as: S

df = Explorer.Datasets.iris()
window_size = 3

S.to_enum(df[:sepal_length])
|> Enum.reduce({[], []}, fn e, {head, acc} ->
  head = head ++ [e]

  acc =
    if Enum.count(head) < window_size do
      acc ++ [nil]
    else
      acc ++
        [
          head
          |> Enum.reverse()
          |> Enum.take(window_size)
          |> S.from_list()
          |> S.standard_deviation()
        ]
    end

  {head, acc}
end)
|> elem(1)

I'm presenting this here so that the journey of how to implement this is documented as well, hope it helps.
This would work for smaller dataframes, but performance would take a huge hit on larger ones.

So we got to a solution that looks like this:

df = Explorer.Datasets.iris()
window_size = 3
max_offset = S.size(df[:sepal_length]) - window_size

0..max_offset
|> Stream.map(&S.slice(df[:sepal_length], &1, window_size))
|> Stream.map(&S.standard_deviation/1)
|> Stream.chunk_every(1)
|> Stream.map(&S.from_list/1)
|> Enum.reduce(S.from_list([]), &S.concat(&2, &1))

If you have any pointers on this approach it would be of great help.

Some things I need to calculate over rolling windows:

  • Standard deviation, quantile, variance, skew, cumulative sum and cumulative sum product (available in polars)
  • Correlation between different series
  • GARCH
  • Cointegration tests

Thanks!

@josevalim
Copy link
Member

Maybe we could have a Series.window_map(series, callback) function? The callback receives sliced series and it must numbers something that we can convert to a series again later?

Btw, I think your implementation could be:

0..max_offset
|> Stream.map(&S.slice(df[:sepal_length], &1, window_size))
|> Stream.map(&S.standard_deviation/1)
|> Enum.to_list()
|> S.from_list()

but i am not sure.

@josevalim
Copy link
Member

Would you like to send a PR for Series.window_map btw?

@guarilha
Copy link
Contributor Author

Created this PR to explore a bit the codebase and test the waters. Waiting for review on it to make sure everything is ok. After that I plan on adding a bunch of functions that I need as well. Hope it helps.

@mrcwinn
Copy link

mrcwinn commented Jul 9, 2023

Hi, I also have a use case for this. Here's equivalent code in Python:

df['atl'] = df['tss'].rolling(window=7).apply(lambda x: calculate_atl_recursive(x))

I can't solve this with the current package API, unless I'm missing something.

Thank you!

EDIT: I just realized who I'm in a thread with (famous people). Extra thank you for all your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants