Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine: refine writer interface #741

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Nov 29, 2024

Hi, I'm working on writer recently and find some weaknesses of our original writer interface design:

  1. Our IcebergWriter Builder interface looks like following:
trait IcebergWriterBuilder {
 type C;
 async fn build(self, config: Self::C) -> Result<Self::R>;
}

And I realized that this custom config param cause we can't combine the write flexibility. E.g. In partition writer

struct PartitionWriter<IcebergWriterBuilder> {
  inner_writer_builder: B
}

impl  PartitionWriter<IcebergWriterBuilder>  {
  pub async fn write(..) {
      self.inner_writer_builder.build(...) // We can't build because we don't know pass which param.
  }
}

So avoid this problem, we should pass the custom param when create the builder and the build interface should looks like fn build() -> Self

  1. The schema of FileWriter can determined by base writer like: data file writer, position delete writer, equality delete writer.
    In our original design, user should pass the schema to file writer builder when create them like
let file_builder = ParquetWriterBuilder(schema)

However, sometimes the schema is hard to determine when we create them. E.g. equality delete writer, we only know what the schema looks like util we pass the equality id and create the equality delete writer. To avoid the problem, we change fn build() -> Self of FileWriterBuilder to fn build(schema:SchemaRef) -> Self. By this way, the schema of FileWriter is determined by base writer.

I send this discussion as a PR to make it easier to express my idea. BTW, I applied this change and complete partition writer, delta writer in https://github.com/ZENOTME/iceberg-rust/tree/partition_writer and it looks well.

Feel free for any suggestion. cc @liurenjie1024 @Xuanwo @Fokko @c-thiel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant