Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input parquet got decoding panic #3091

Open
fearfate opened this issue Dec 18, 2024 · 1 comment
Open

input parquet got decoding panic #3091

fearfate opened this issue Dec 18, 2024 · 1 comment
Labels
bug processors Any tasks or issues relating specifically to processors waiting for upstream Blocked on changes needed in an upstream dependency

Comments

@fearfate
Copy link
Contributor

config:

input:
  parquet:
    paths:
      - '*.parquet'
    auto_replay_nacks: false
    # batch_count: 0

output:
  drop: {}

run process like this:

image

the parquet file:

example.zip

@mihaitodor mihaitodor added bug processors Any tasks or issues relating specifically to processors needs investigation It looks as though have all the information needed but investigation is required labels Dec 24, 2024
@mihaitodor
Copy link
Collaborator

Hey @fearfate thanks for raising this! The issue seems to be in the parquet-go library (this code panics for certain rows). I raised an issue upstream: parquet-go/parquet-go#204

Note: I was able to read the file successfully using https://github.com/apache/arrow-go/tree/249ec029ad4c02488c73eee2dfcbc2aec89ff464/parquet/cmd/parquet_reader. Might be worth having this library as an alternative implementation in the future.

@mihaitodor mihaitodor added waiting for upstream Blocked on changes needed in an upstream dependency and removed needs investigation It looks as though have all the information needed but investigation is required labels Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug processors Any tasks or issues relating specifically to processors waiting for upstream Blocked on changes needed in an upstream dependency
Projects
None yet
Development

No branches or pull requests

2 participants