Design of Serialization/Deserialization #2

JanKaul · 2023-07-21T07:54:26Z

I would like to have a discussion about how to implement the Serialization/Deserialization of the iceberg metadata. There are currently two different designs each with its pros and cons. The designs are:

Split representation into in_memory/on_disk part (example)
Use custom implementations of Serialize/Deserialize (example)

I would like to get your opinion which design you prefer.

The text was updated successfully, but these errors were encountered:

ZENOTME · 2023-07-21T08:06:15Z

If I'm not mistaken, use the split representation also need to custom implementation for on_disk part. And we should maintain two system: in_memory::type, in_memory::value, on_disk::type, on_disk::value. And we also need to maintain the conversion between in_memory and on_disk. So I think strictly separate two part may cause redundant code. Can we only separate some part🤔?

JanKaul · 2023-07-21T08:22:03Z

I also think that a split representation leads to redundancy which makes maintaining the code more difficult.

I personally think that the added complexity of custom implementations of Serialize/Deserialize is worth avoiding the redundancy.

JanKaul · 2023-07-21T08:44:07Z

I realized that this discussion is closely related to #3 , because it gives additional motivation for splitting the representation into an in_memory and on_disk part.

I think @ZENOTME made a good point about splitting only a part of the spec.

We could split the representation for all structs that have user interaction (i.e. TableMetadata) but use a single representation for the smaller building blocks (i.e. types, values, ..)

liurenjie1024 · 2023-07-21T08:52:44Z

Personally, I prefer the Serialize/Deserialize approach. From my experience, the effort doesn't reduce much since both way you need to maintain the conversion logic. The on_disk approach has some shorting comings:

It's not easy to be comptaible with both v1 and v2 spec. See this pr, I have to remove v1 support since avro has strict schema check.
Sometimes we still need to write the serializaer, see https://github.com/liurenjie1024/icelake/blob/eee3520f993837f4e1745c60538ba7ab1b421481/src/types/in_memory.rs#L64
https://github.com/icelake-io/icelake/blob/eee3520f993837f4e1745c60538ba7ab1b421481/src/types/in_memory.rs#L64

ZENOTME · 2023-07-21T09:00:19Z

We could split the representation for all structs that have user interaction (i.e. TableMetadata) but use a single representation for the smaller building blocks (i.e. types, values, ..)

Sounds good for me.

Xuanwo · 2023-07-21T09:12:43Z

Ok, let's pick the Serialize/Deserialize approach.

We could split the representation for all structs that have user interaction (i.e. TableMetadata) but use a single representation for the smaller building blocks (i.e. types, values, ..)

Nice idea!

liurenjie1024 · 2023-12-14T03:59:44Z

Close this as we have finished discussion. Feel free to open if needed.

JanKaul mentioned this issue Jul 21, 2023

Discussion: The design of in memory model of iceberg spec. #3

Closed

ZENOTME mentioned this issue Jul 24, 2023

feat: add partition field in data file icelake-io/icelake#115

Merged

liurenjie1024 mentioned this issue Aug 18, 2023

feat: Table metadata #29

Merged

liurenjie1024 closed this as completed Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design of Serialization/Deserialization #2

Design of Serialization/Deserialization #2

JanKaul commented Jul 21, 2023

ZENOTME commented Jul 21, 2023

JanKaul commented Jul 21, 2023

JanKaul commented Jul 21, 2023 •

edited

Loading

liurenjie1024 commented Jul 21, 2023

ZENOTME commented Jul 21, 2023

Xuanwo commented Jul 21, 2023 •

edited

Loading

liurenjie1024 commented Dec 14, 2023

Design of Serialization/Deserialization #2

Design of Serialization/Deserialization #2

Comments

JanKaul commented Jul 21, 2023

ZENOTME commented Jul 21, 2023

JanKaul commented Jul 21, 2023

JanKaul commented Jul 21, 2023 • edited Loading

liurenjie1024 commented Jul 21, 2023

ZENOTME commented Jul 21, 2023

Xuanwo commented Jul 21, 2023 • edited Loading

liurenjie1024 commented Dec 14, 2023

JanKaul commented Jul 21, 2023 •

edited

Loading

Xuanwo commented Jul 21, 2023 •

edited

Loading