Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-14932: [Python] Add python bindings for JSON streaming reader #45084

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pan-x-c
Copy link

@pan-x-c pan-x-c commented Dec 20, 2024

Rationale for this change

The C++ arrow has a JSON streaming reader which is not exposed on the Python interface.

What changes are included in this PR?

This PR is based on #33761. It adds the open_json method to open a streaming reader for a JSON file.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes. A new open_json method has been added to the Python interface, located at pyarrow.json.open_json, and its parameters are the same as the pyarrow.json.read_json

Copy link

⚠️ GitHub issue #14932 has been automatically assigned in GitHub to PR creator.

assert reader.schema == expected_schema
assert reader.read_next_batch().to_pydict() == expected_data

def test_reconcile_across_blocks(self):
Copy link
Author

@pan-x-c pan-x-c Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test case in BaseTestJSON.test_reconcile_across_blocks is not compatible with the JSON stream reader.
I made some changes, but I don't know whether this change is reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants