Upcoming pandas (>2.2.0) raises "read-only" errors #919

martindurant · 2024-02-07T14:58:14Z

No longer allows setting series values in-place. Thanks pandas.

jorisvandenbossche · 2024-02-19T19:42:33Z

You're welcome!

The returning of read-only numpy arrays is certainly one of the parts of the large CoW change (https://pandas.pydata.org/pdeps/0007-copy-on-write.html) we are least certain about. So feedback from downstream developers is certainly welcome.

I assume the issue here is because you allocate an empty dataframe first, and then get "view" arrays to write into. For the index, in one of the code paths that happens here:

fastparquet/fastparquet/dataframe.py

Line 156 in eec9e61

views[col] = index.values

The return value of .values is now a read-only numpy array (https://pandas.pydata.org/docs/user_guide/copy_on_write.html#read-only-numpy-arrays). You know you just created this data yourself, so you can safely change its writeable flag to True as a workaround.

And I suppose this only happens for the Index, because for columns you rely on the Block.values, where we didn't add this protection as this is regarded as internal anyway.

It's probably already covered by the failing tests you have in fastparquet's own test suite, but listing here some tests that are failing on the pandas side (they were being skipped with CoW enabled for some time, we should have reported that earlier):

# dataframe with a non-default (i.e. non-RangeIndex) index
df = pd.DataFrame({"A": [1, 2, 3]}, index=list("abc"))
df.to_parquet("test.parquet", engine="fastparquet")
pd.read_parquet("test.parquet", engine="fastparquet")

# probably same underlying issue; tz-aware datetime index
import datetime
idx = [datetime.datetime.now(datetime.timezone.utc)] * 5
df = pd.DataFrame(index=idx, data={"index_as_col": idx})
df.to_parquet("test.parquet", engine="fastparquet")
pd.read_parquet("test.parquet", engine="fastparquet")

martindurant · 2024-02-22T14:35:17Z

Thanks for the info, @jorisvandenbossche . Any idea of the release timeline?

jorisvandenbossche · 2024-02-23T10:11:15Z

The current goal is April

jorisvandenbossche mentioned this issue Feb 19, 2024

CoW: Remove remaining cow occurrences from tests pandas-dev/pandas#57477

Merged

jorisvandenbossche mentioned this issue Mar 6, 2024

BUG: fastparquet interface fails on load with non-unique index and CoW pandas-dev/pandas#57673

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upcoming pandas (>2.2.0) raises "read-only" errors #919

Upcoming pandas (>2.2.0) raises "read-only" errors #919

martindurant commented Feb 7, 2024

jorisvandenbossche commented Feb 19, 2024

martindurant commented Feb 22, 2024

jorisvandenbossche commented Feb 23, 2024

Upcoming pandas (>2.2.0) raises "read-only" errors #919

Upcoming pandas (>2.2.0) raises "read-only" errors #919

Comments

martindurant commented Feb 7, 2024

jorisvandenbossche commented Feb 19, 2024

martindurant commented Feb 22, 2024

jorisvandenbossche commented Feb 23, 2024