Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Read parquet tables with int96 coercion option #1231

Merged
merged 15 commits into from
Aug 9, 2023

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Aug 4, 2023

Add option to coerce int96 timestamps to a specific TimeUnit during reads of Parquet

@github-actions github-actions bot added the enhancement New feature or request label Aug 4, 2023
@jaychia jaychia requested a review from samster25 August 4, 2023 20:19
@jaychia
Copy link
Contributor Author

jaychia commented Aug 4, 2023

Issue filed in arrow2: jorgecarleitao/arrow2#1527

@jaychia jaychia changed the title [FEAT] Parse Int96 timestamps and coerce to different precisions [FEAT] Read parquet tables with schema and schema inference options Aug 8, 2023
@@ -94,33 +95,30 @@ impl ParquetReaderBuilder {

let metadata = read_parquet_metadata(uri, size, io_client).await?;
let num_rows = metadata.num_rows;
let schema =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: schema inference is now deferred to only when .build() is called

This allows the builder to make better decisions about whether it needs to infer the schema (i.e. did the user provide one?) and, if so, how it should infer the schema (what flags did the user provide)?

@jaychia jaychia force-pushed the jay/timestamp-int96-overflow branch from bb47bb9 to 657986c Compare August 8, 2023 00:53
@jaychia jaychia force-pushed the jay/timestamp-int96-overflow branch from 657986c to 65852b6 Compare August 8, 2023 01:59
@jaychia jaychia changed the title [FEAT] Read parquet tables with schema and schema inference options [FEAT] Read parquet tables with int96 coercion option Aug 8, 2023
@codecov
Copy link

codecov bot commented Aug 9, 2023

Codecov Report

Merging #1231 (ec8cf24) into main (f5837a2) will decrease coverage by 0.44%.
Report is 3 commits behind head on main.
The diff coverage is 95.65%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1231      +/-   ##
==========================================
- Coverage   87.69%   87.26%   -0.44%     
==========================================
  Files          56       62       +6     
  Lines        5698     5944     +246     
==========================================
+ Hits         4997     5187     +190     
- Misses        701      757      +56     
Files Changed Coverage Δ
daft/table/table_io.py 94.05% <66.66%> (-1.00%) ⬇️
daft/logical/schema.py 93.68% <100.00%> (+0.27%) ⬆️
daft/runners/partitioning.py 83.63% <100.00%> (+1.28%) ⬆️
daft/table/table.py 88.80% <100.00%> (+0.18%) ⬆️

... and 19 files with indirect coverage changes

@jaychia jaychia merged commit 8fe5597 into main Aug 9, 2023
@jaychia jaychia deleted the jay/timestamp-int96-overflow branch August 9, 2023 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant