-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spec][Doc] fury cross-language serialization specification proposal #1418
Labels
enhancement
New feature or request
Comments
chaokunyang
changed the title
[Spec] standardizing fury cross-language serialization specification
[Spec][Doc] standardizing fury cross-language serialization specification
Mar 22, 2024
chaokunyang
changed the title
[Spec][Doc] standardizing fury cross-language serialization specification
[Spec][Doc] fury cross-language serialization specification proposal
Mar 26, 2024
chaokunyang
added a commit
that referenced
this issue
Mar 30, 2024
…tion (#1413) ## What does this PR do? This PR standardizes fury cross-language serialization specification. It comes with following changes: - Remove type tag from the protocol since it introduce space and performance overhead to the implementation. The `type tag` version can be seen in https://github.com/apache/incubator-fury/blob/6ea2e0b83d5449d63ca62296ff0dfd67b96c5bc5/docs/protocols/xlang_object_graph_spec.md . - Fury preserves `0~63` for internal types, but let users register type by id from `0`(added by 64 automatically) to setup type mapping between languages. - Streamline the type systems, only `bool/byte/i16/i32/i64/half-float/float/double/string/enum/list/set/map/Duration/Timestamp/decimal/binary/array/tensor/sparse/tensor/arrow/record/batch/arrow/table` are allowed. - Formulized the binary format for above types. - Add type disambiguation: the deserialization are determined by data type in serialized binary and target type jointly. - Introduce meta string encoding algorithm for field name to reduce space cost by 3/8. - Introduce schema consist mode format for struct. - Introduce schema envolution mode for struct: - this mode can embeed meta in the data or share across multiple messages, - it can avoid the cost of type tag comparison in frameworks like protobuf This protocol also supports object inheriance for xlang serializaiton. This is a feature request that users has been discussed for a long time in protobuf/flatbuffer: - google/flatbuffers#4006 - protocolbuffers/protobuf#5645 Although there are some languages such as `rust/golang` doesn't support inheriance, there are many cases only langauges like `java/c#/python/javascript` are involved, and the support for inheriance is not complexed in the protocol level, so we added the inheriance support in the protocol. And in languages such as `rust/golang`, we can use some annotation to mark composition field as parent class for serialization layout, or we can disable inheriance foor such languages at the protocol level. The protocol support polymorphic natively by type id, so I don't include types such as `OneOf/Union`. With this protocol, you can even serialize multiple rust `dyn trait` object which implement same trait., and get exactly the same objects when deserialization. ## Related issue This PR Closes #1418 --------- Co-authored-by: Twice <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
We've standardized java serialization spec in #1240, but the cross-language serialzation spec has never been formulized.
The current implementation of fury xlang serialization across multiple languages are all based on the code in one of the languages. It's not complete and pone to inconsistencies.
And if some one want to implement Fury for a new language such as Fury C# in #686, he must read all Fury java serialization code. This would be a huge burden for new developers. Not even to say someone may don't write java either.
Another thing is that our xlang serialization is not standardized, we can't have a foundation to discuss how to improve our protocol too.
And our current xlang serialization has many places to improve, such as it didn't resolve the type inconsistencies between languages. Such things should be resolved too.
Describe the solution you'd like
We should design a new protocol for Fury and standardized it as a document.
Additional context
Serialization frameworks such as arrow/avro/hession/thrift/flatbuffer/msgpack all have a serialization spec:
The text was updated successfully, but these errors were encountered: