Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support file_size_bytes option #100

Open
wants to merge 1 commit into
base: aykut/cache-object-stores
Choose a base branch
from

Conversation

aykut-bozkurt
Copy link
Collaborator

COPY TO parquet now supports a new option, called file_size_bytes, which lets you generate parquet files with target size = file_size_bytes.

When a parquet file exceeds the target size, it will be flushed and a new parquet file will be generated under a parent directory. (parent directory will be the path without the parquet extension)

e.g.

COPY (select 'hellooooo' || i from generate_series(1, 1000000) i) to '/tmp/test.parquet' with (file_size_bytes 1048576);
> ls -alh /tmp/test/
1.4M data_0.parquet
1.4M data_1.parquet
1.4M data_2.parquet
1.4M data_3.parquet
114K data_4.parquet

COPY TO parquet now supports a new option, called `file_size_bytes`, which lets you
generate parquet files with target size = `file_size_bytes`.

When a parquet file exceeds the target size, it will be flushed and a new parquet file
will be generated under a parent directory. (parent directory will be the path without
the parquet extension)

e.g.

```sql
COPY (select 'hellooooo' || i from generate_series(1, 1000000) i) to '/tmp/test.parquet' with (file_size_bytes 1048576);
```

```bash
> ls -alh /tmp/test/
1.4M data_0.parquet
1.4M data_1.parquet
1.4M data_2.parquet
1.4M data_3.parquet
114K data_4.parquet
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant