Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(spec): standardizing fury cross-language serialization specifica…
…tion (#1413) ## What does this PR do? This PR standardizes fury cross-language serialization specification. It comes with following changes: - Remove type tag from the protocol since it introduce space and performance overhead to the implementation. The `type tag` version can be seen in https://github.com/apache/incubator-fury/blob/6ea2e0b83d5449d63ca62296ff0dfd67b96c5bc5/docs/protocols/xlang_object_graph_spec.md . - Fury preserves `0~63` for internal types, but let users register type by id from `0`(added by 64 automatically) to setup type mapping between languages. - Streamline the type systems, only `bool/byte/i16/i32/i64/half-float/float/double/string/enum/list/set/map/Duration/Timestamp/decimal/binary/array/tensor/sparse/tensor/arrow/record/batch/arrow/table` are allowed. - Formulized the binary format for above types. - Add type disambiguation: the deserialization are determined by data type in serialized binary and target type jointly. - Introduce meta string encoding algorithm for field name to reduce space cost by 3/8. - Introduce schema consist mode format for struct. - Introduce schema envolution mode for struct: - this mode can embeed meta in the data or share across multiple messages, - it can avoid the cost of type tag comparison in frameworks like protobuf This protocol also supports object inheriance for xlang serializaiton. This is a feature request that users has been discussed for a long time in protobuf/flatbuffer: - google/flatbuffers#4006 - protocolbuffers/protobuf#5645 Although there are some languages such as `rust/golang` doesn't support inheriance, there are many cases only langauges like `java/c#/python/javascript` are involved, and the support for inheriance is not complexed in the protocol level, so we added the inheriance support in the protocol. And in languages such as `rust/golang`, we can use some annotation to mark composition field as parent class for serialization layout, or we can disable inheriance foor such languages at the protocol level. The protocol support polymorphic natively by type id, so I don't include types such as `OneOf/Union`. With this protocol, you can even serialize multiple rust `dyn trait` object which implement same trait., and get exactly the same objects when deserialization. ## Related issue This PR Closes #1418 --------- Co-authored-by: Twice <[email protected]>
- Loading branch information