Skip to content

DataEngineeringLabs/ranged-reader-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ranged reader

test codecov

Convert low-level APIs to read ranges of files into structs that implement Read + Seek and AsyncRead + AsyncSeek. See parquet_s3_async.rs for an example of this API to read parts of a large parquet file from s3 asynchronously.

Rational

Blob storage https APIs offer the ability to read ranges of bytes from a single blob, i.e. functions of the form

fn read_range_blocking(path: &str, start: usize, length: usize) -> Vec<u8>;
async fn read_range(path: &str, start: usize, length: usize) -> Vec<u8>;

together with its total size,

async fn length(path: &str) -> usize;
fn length(path: &str) -> usize;

These APIs are usually IO-bounded - they wait for network.

Some file formats (e.g. Apache Parquet, Apache Avro, Apache Arrow IPC) allow reading parts of a file for filter and projection push down.

This crate offers 2 structs, RangedReader and RangedStreamer that implement Read + Seek and AsyncRead + AsyncSeek respectively, to bridge the blob storage APIs mentioned above to the traits used by most Rust APIs to read bytes.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

A reader that buffers ranged calls

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages