Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'NoneType' object is not iterable #867

Open
davetapley opened this issue Jun 5, 2023 · 2 comments
Open

TypeError: 'NoneType' object is not iterable #867

davetapley opened this issue Jun 5, 2023 · 2 comments

Comments

@davetapley
Copy link

davetapley commented Jun 5, 2023

Describe the issue:

No ParquetException is thrown if metadata is corrupt.

Minimal Complete Verifiable Example:

return ParquetFile("corrupt.parquet")

I think I'm getting corruption due to bad file locking elsewhere in my app.
I'll attach file if I can catch the failure.

Anything else we need to know?:

I suspect None is returned here:

def from_buffer(buffer, name=None):
cdef NumpyIO buf
if isinstance(buffer, NumpyIO):
buf = buffer
else:
buf = NumpyIO(buffer)
cdef dict o = read_thrift(buf)
if name is not None:
return ThriftObject(name, o)
return o

Then we try and get [4] without a None check here:

fmd = from_buffer(data, "FileMetaData")
except Exception:
raise ParquetException('Metadata parse failed: %s' % self.fn)
# for rg in fmd.row_groups:
for rg in fmd[4]:

Environment:

  • Dask version: fastparquet == 2023.2.0
  • Python version: 3.10.9
  • Operating System: Ubuntu
  • Install method (conda, pip, source): pip
@davetapley
Copy link
Author

Happy for a PR?

I owe one for ⬇️ anyway

@martindurant
Copy link
Member

It is true that the low-level thrift reader does not raise errors, just ends up with a set of objects that are empty/None or otherwise not initialised. It would be reasonable to make a few basic verification checks and raising an error as you say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants