You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a partitioned parquet dataset containing a datetime64[ns, UTC] column ts (i.e., it is timezone-aware, withtz=UTC). The following pandas invocation does not work with engine=fastparquet:
File ~/Library/Caches/pypoetry/virtualenvs/project-GfuZs_x0-py3.8/lib/python3.8/site-packages/fastparquet/api.py:1090, in filter_out_stats(rg, filters, schema)
1088 s["converted_min"] = vmin
1089 vmin = s["converted_min"]
-> 1090 if filter_val(op, val, vmin, vmax):
1091 return True
1092 return False
File ~/Library/Caches/pypoetry/virtualenvs/project-GfuZs_x0-py3.8/lib/python3.8/site-packages/fastparquet/api.py:1334, in filter_val(op, val, vmin, vmax)
1332 return filter_not_in(val, vmin, vmax)
1333 if vmax is not None:
-> 1334 if op in ['==', '>=', '='] and val > vmax:
1335 return True
1336 if op == '>' and val >= vmax:
File ~/Library/Caches/pypoetry/virtualenvs/project-GfuZs_x0-py3.8/lib/python3.8/site-packages/pandas/_libs/tslibs/timestamps.pyx:253, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__()
TypeError: Cannot compare tz-naive and tz-aware timestamps
The same invocation works fine with engine=pyarrow. On the other hand, fastparquet is able to do the filtering if the timezone is omitted (and of course pyarrow fails):
This does not surprise me: the "statistics" in the parquet file are stored without any timezone and values for the column are only applied after complete load. It would be reasonable to apply time zones to the statistics, but I suspect it would be annoying to implement.
I have a partitioned parquet dataset containing a
datetime64[ns, UTC]
columnts
(i.e., it is timezone-aware, withtz=UTC
). The following pandas invocation does not work withengine=fastparquet
:Tail of traceback:
The same invocation works fine with
engine=pyarrow
. On the other hand, fastparquet is able to do the filtering if the timezone is omitted (and of course pyarrow fails):I suspect that pyarrow has the right idea here?
Environment:
fastparquet==0.8.0
The text was updated successfully, but these errors were encountered: