Discussion: Design of `TableMetadataBuilder`. #232

liurenjie1024 · 2024-03-07T01:58:22Z

Problem statement

TableMetadataBuilder is useful in modifying/creating TableMetadata, and is a core data structure of transaction api. There are already some efforts to create one using derived builder, see #62 and #229. While it's easy to implement, I have some concerns about this approach. There are some problems with this approach:

Not easy to review. it's quite easy to forget that we have changed builder api when we do modification to TableMetadata.
Not easy to customize. I've checked the builder in java api, and it has a lot of customization in each method with validations.

The reason I want to be careful with TableMetadataBuilder is that it's a core data structure in iceberg, and we should not expose apis which be misused to construct inconsistent TableMetadata.

Proposal

I want to propose to implement TableMetadataBuilder in a similar approach like java api, rather than using derived builder directly, such as following:

pub struct TableMetadataBuilder(TableMetadata);

impl TableMetadataBuilder {
   pub fn add_schema(mut self, schema: Schema) -> Result<Self> {
       // Check if schema validity
   }

  pub fn add_unbound_partition_spec(mut self, spec: UnboundPartitionSpec) -> Result<Self> { 
   // Binding partition spec
  }

  pub fn build(self) -> Result<TableMetadata> {}
}

The text was updated successfully, but these errors were encountered:

liurenjie1024 · 2024-03-07T01:59:04Z

cc @Xuanwo @ZENOTME @Fokko @JanKaul @y0psolo

Xuanwo · 2024-03-07T02:29:54Z

When add_schema has been called, both schemas and current_schema_id will be updated? The same question to partition_specs.

liurenjie1024 · 2024-03-07T02:42:52Z

When add_schema has been called, both schemas and current_schema_id will be updated? The same question to partition_specs.

Yes, the add_schema process consists of several steps as in java implementation

Check if already existed same schema.
If not, assign schema id to it.
Modify schemas.

Similar things happens for partition_specs.

Other methods such as setBranchSnapshot are also quite complicated.

sdd · 2024-03-07T08:07:54Z

I agree with @liurenjie1024. Looking at the Java implementation, it seems like there is too much going on here for a derived builder to be a good fit, and a hand-rolled builder would be more appropriate.

ZENOTME · 2024-03-08T03:16:16Z

I want to propose to implement TableMetadataBuilder in a similar approach like java api, rather than using derived builder directly

Implementing the builder manually looks good to me.

liurenjie1024 · 2024-03-10T05:20:51Z

Cool, seems we reached consensus on this, let's move on to create TableMetadataBuilder.

marvinlanhenke · 2024-03-10T10:19:02Z

what about supporting FormatVersion V1 and V2? Will we simply model V2 and provide all optional fields as well for V1 in order to simplify the struct TableMetadata (like it is right now?) - or would there be any reason to change that?

Also how would we handle the update or modifications to an existing TableMetadata. Using TableMetadataBuilder for the controlled construction and implement the update methods directly on the TableMetadata could be one approach?

liurenjie1024 · 2024-03-10T12:20:47Z

what about supporting FormatVersion V1 and V2? Will we simply model V2 and provide all optional fields as well for V1 in order to simplify the struct TableMetadata (like it is right now?) - or would there be any reason to change that?

Sorry, I don't get your point. I think currently our approach is just like what you say, and we don't have plan to change it. Oh, you mean further modification to TableMetadata ? In fact, I mean the evolution of table metadata spec in future.

Also how would we handle the update or modifications to an existing TableMetadata. Using TableMetadataBuilder for the controlled construction and implement the update methods directly on the TableMetadata could be one approach?

That's possible, but that means we need to implement the validation logic in two places, and I think we should avoid such duplication.

marvinlanhenke · 2024-03-10T12:28:37Z

[...] and we don't have plan to change it

That's possible, but that means we need to implement the validation logic in two places

Makes sense to me. thanks for clarifying.

c-thiel · 2024-07-23T09:32:52Z

@liurenjie1024, @Fokko do you know if someone is working on a full implementation of the TableMetadataBuilder?

The question is if we need it similar to java with all that's being implemented as part of transactions.

If we need it, and nobody is working on it, I would prepare a PR a cleaned-up Version of what we have in the Iceberg-Catalog over to here. It's mostly about adding all modifications as addition of snapshots, schema evolution, partitioning, ordering, various validations, partition binding, etc.

It would mainly contain the logic in this class:
https://github.com/apache/iceberg/blob/604b2bb85ccd33a132e0f588fad1611ab655e2c5/core/src/main/java/org/apache/iceberg/TableMetadata.java#L863
And a few associated classes.

liurenjie1024 · 2024-07-23T13:42:43Z

@liurenjie1024, @Fokko do you know if someone is working on a full implementation of the TableMetadataBuilder?

The question is if we need it similar to java with all that's being implemented as part of transactions.

If we need it, and nobody is working on it, I would prepare a PR a cleaned-up Version of what we have in the Iceberg-Catalog over to here. It's mostly about adding all modifications as addition of snapshots, schema evolution, partitioning, ordering, various validations, partition binding, etc.

It would mainly contain the logic in this class: https://github.com/apache/iceberg/blob/604b2bb85ccd33a132e0f588fad1611ab655e2c5/core/src/main/java/org/apache/iceberg/TableMetadata.java#L863 And a few associated classes.

Hi, @c-thiel AFAIK, nobody is working on this. The plan was to add it one by one when we implement transaction api. It would be greate if you are interested in contributiong.

c-thiel · 2024-07-24T08:31:12Z

@liurenjie1024 I would be interested to contribute.

Let me maybe start with partition binding - i.e. binding an UnboundPartitionSpec to a schema - making it a bound PartitionSpec or re-binding to a different schema on updates.

Do you have a strong feeling of where the code should go? As part of an Action or separately? If not, I would just make a proposal.

liurenjie1024 · 2024-07-24T12:53:38Z

@liurenjie1024 I would be interested to contribute.

Let me maybe start with partition binding - i.e. binding an UnboundPartitionSpec to a schema - making it a bound PartitionSpec or re-binding to a different schema on updates.

Do you have a strong feeling of where the code should go? As part of an Action or separately? If not, I would just make a proposal.

Hi, @c-thiel Thanks for your interest. Typical I think small prs would be easier to review and test. That's to say, a prefered approach would be to implement apis for TableMetadataBuilder in one pr, and add one Action with another pr.

Minor nit, putting all actions in one transaction.rs file maybe too large to maintain, and spltting transaction module into files will be easier to maintain.

liurenjie1024 changed the title ~~Discuss: Design of TableMetadataBuilder.~~ Discussion: Design of TableMetadataBuilder. Mar 7, 2024

liurenjie1024 closed this as completed Mar 10, 2024

liurenjie1024 reopened this Mar 10, 2024

c-thiel mentioned this issue May 14, 2024

TableMetadataBuilder: Review / Test / Implement missing Features lakekeeper/lakekeeper#14

Closed

c-thiel mentioned this issue Sep 3, 2024

TableMetadataBuilder #587

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Design of `TableMetadataBuilder`. #232

Discussion: Design of `TableMetadataBuilder`. #232

liurenjie1024 commented Mar 7, 2024

liurenjie1024 commented Mar 7, 2024

Xuanwo commented Mar 7, 2024

liurenjie1024 commented Mar 7, 2024

sdd commented Mar 7, 2024

ZENOTME commented Mar 8, 2024

liurenjie1024 commented Mar 10, 2024

marvinlanhenke commented Mar 10, 2024 •

edited

Loading

liurenjie1024 commented Mar 10, 2024

marvinlanhenke commented Mar 10, 2024

c-thiel commented Jul 23, 2024 •

edited

Loading

liurenjie1024 commented Jul 23, 2024

c-thiel commented Jul 24, 2024

liurenjie1024 commented Jul 24, 2024 •

edited

Loading

Discussion: Design of TableMetadataBuilder. #232

Discussion: Design of TableMetadataBuilder. #232

Comments

liurenjie1024 commented Mar 7, 2024

Problem statement

Proposal

liurenjie1024 commented Mar 7, 2024

Xuanwo commented Mar 7, 2024

liurenjie1024 commented Mar 7, 2024

sdd commented Mar 7, 2024

ZENOTME commented Mar 8, 2024

liurenjie1024 commented Mar 10, 2024

marvinlanhenke commented Mar 10, 2024 • edited Loading

liurenjie1024 commented Mar 10, 2024

marvinlanhenke commented Mar 10, 2024

c-thiel commented Jul 23, 2024 • edited Loading

liurenjie1024 commented Jul 23, 2024

c-thiel commented Jul 24, 2024

liurenjie1024 commented Jul 24, 2024 • edited Loading

Discussion: Design of `TableMetadataBuilder`. #232

Discussion: Design of `TableMetadataBuilder`. #232

marvinlanhenke commented Mar 10, 2024 •

edited

Loading

c-thiel commented Jul 23, 2024 •

edited

Loading

liurenjie1024 commented Jul 24, 2024 •

edited

Loading