Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show(df) does not work with modin.pandas #325

Open
wpritom opened this issue Oct 6, 2024 · 5 comments
Open

show(df) does not work with modin.pandas #325

wpritom opened this issue Oct 6, 2024 · 5 comments

Comments

@wpritom
Copy link

wpritom commented Oct 6, 2024

show() is not working while I'm importing pandas with from modin. I'm using modin to improve pandas performance.

import modin.pandas as pd

df = pd.read_csv("****.csv")

Now show(df, classes="display") column showing the following error.

AttributeError: 'DataFrame' object has no attribute 'iter_rows'

@mwouts
Copy link
Owner

mwouts commented Oct 6, 2024

Hi @wpritom , thanks for reporting this! Yes that's right currently ITables only supports Pandas and Polars DataFrames.

Can you convert df back to a Pandas DataFrame before calling show, for now at least?

You can leave this issue open so that I look to add support for Modin DataFrames when time permits. Thanks.

@MarcoGorelli
Copy link

Hi @mwouts - would you be open to using Narwhals in ITables?

I think this could simplify some of the code, e.g. this, and would also give you support for pandas / Polars / Modin / cuDF / PyArrow (and any other Narwhals-compatible eager dataframe), without making any of them required dependencies

Happy to make a PR if you'd be interested, just gauging interest first

@mwouts
Copy link
Owner

mwouts commented Nov 12, 2024

Hey @MarcoGorelli , Narwhals sounds like a great package indeed! And sure I would love to provide support for more dataframe types, see for instance #217 (pending) where I started working on Ibis support.

I would love to see how that part of the code would look like with Narwhals! The parts that we would need to rewrite that I am currently thinking of (there might be more) are

  • the downsampling part (estimate the size of the table content, then keep only a certain number of top and bottom rows, first and last columns)
  • the conversion from Python data to Javascript data.

Looking forward to hearing more from you!

@DeaMariaLeon
Copy link
Contributor

Hi @mwouts, I'm working on this.
Just so you know that we are around. (I'm a Narwhals team member, 🙂).

@mwouts
Copy link
Owner

mwouts commented Dec 15, 2024

Hi @wpritom , we're getting something that is starting to work - huge thanks to @DeaMariaLeon and to @MarcoGorelli !

Can you give a try at this PR and let us know how it works for you?

pip install git+https://github.com/mwouts/itables.git@use_narwhals

Also I am not familiar with modin, so I am wondering if it is expected that the modin tests are much slower than the pandas ones?

Last but not least I see warnings on my empty dataframes in the sample dataframe notebook (docs/modin_dataframes.md), I guess they come from modin itself?

UserWarning: `DataFrame.memory_usage` for empty DataFrame is not currently supported by PandasOnDask, defaulting to pandas implementation.
Please refer to https://modin.readthedocs.io/en/stable/supported_apis/defaulting_to_pandas.html for explanation.
UserWarning: `DataFrame.memory_usage` for empty DataFrame is not currently supported by PandasOnDask, defaulting to pandas implementation.

UserWarning: `DataFrame.itertuples` for empty DataFrame is not currently supported by Pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants