Skip to content

Commit

Permalink
feat(spec): standardizing fury cross-language serialization specifica…
Browse files Browse the repository at this point in the history
…tion (#1413)

## What does this PR do?

This PR standardizes fury cross-language serialization specification. It
comes with following changes:
- Remove type tag from the protocol since it introduce space and
performance overhead to the implementation. The `type tag` version can
be seen in
https://github.com/apache/incubator-fury/blob/6ea2e0b83d5449d63ca62296ff0dfd67b96c5bc5/docs/protocols/xlang_object_graph_spec.md
.
- Fury preserves `0~63` for internal types, but let users register type
by id from `0`(added by 64 automatically) to setup type mapping between
languages.
- Streamline the type systems, only
`bool/byte/i16/i32/i64/half-float/float/double/string/enum/list/set/map/Duration/Timestamp/decimal/binary/array/tensor/sparse/tensor/arrow/record/batch/arrow/table`
are allowed.
- Formulized the binary format for above types.
- Add type disambiguation: the deserialization are determined by data
type in serialized binary and target type jointly.
- Introduce meta string encoding algorithm for field name to reduce
space cost by 3/8.
- Introduce schema consist mode format for struct.
- Introduce schema envolution mode for struct: 
- this mode can embeed meta in the data or share across multiple
messages,
- it can avoid the cost of type tag comparison in frameworks like
protobuf

This protocol also supports object inheriance for xlang serializaiton.
This is a feature request that users has been discussed for a long time
in protobuf/flatbuffer:
- google/flatbuffers#4006
- protocolbuffers/protobuf#5645

Although there are some languages such as `rust/golang` doesn't support
inheriance, there are many cases only langauges like
`java/c#/python/javascript` are involved, and the support for inheriance
is not complexed in the protocol level, so we added the inheriance
support in the protocol. And in languages such as `rust/golang`, we can
use some annotation to mark composition field as parent class for
serialization layout, or we can disable inheriance foor such languages
at the protocol level.
 
The protocol support polymorphic natively by type id, so I don't include
types such as `OneOf/Union`. With this protocol, you can even serialize
multiple rust `dyn trait` object which implement same trait., and get
exactly the same objects when deserialization.

## Related issue
This PR Closes #1418

---------

Co-authored-by: Twice <[email protected]>
  • Loading branch information
chaokunyang and PragmaTwice authored Mar 30, 2024
1 parent 7d64ede commit c4b4f38
Show file tree
Hide file tree
Showing 5 changed files with 761 additions and 5 deletions.
7 changes: 7 additions & 0 deletions docs/guide/xlang_type_mapping.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
title: Type Mapping of Xlang Serialization
sidebar_position: 3
id: xlang_type_mapping
---

Coming soon.
2 changes: 1 addition & 1 deletion docs/protocols/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Serialization Protocols
- For Cross Language Object Graph Protocol, see [xlang_object_graph_format_spec](./xlang_object_graph_spec.md) doc.
- For Java Object Graph Protocol, see [java_object_graph_format_spec](java_object_graph_spec.md) doc.
- For Cross Language Object Graph Protocol, see [xlang_object_graph_format_spec](./xlang_object_graph.md) doc.
- For Row Format Protocol, see [row format_spec](./row_format.md) doc.
4 changes: 2 additions & 2 deletions docs/protocols/java_object_graph_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ Meta header is a 64 bits number value encoded in little endian order.
meta
for such types is written separately instead of inlining here is to reduce meta space cost if object of this
type is serialized in current object graph multiple times, and the field value may be null too.
- List Type Info: list type will have an extra byte for elements info.
- Collection Type Info: collection type will have an extra byte for elements info.
Users can use annotation to provide those info.
- elements type same
- elements tracking ref
Expand Down Expand Up @@ -211,7 +211,7 @@ Same encoding algorithm as the previous layer except:

## Meta String

Meta string is mainly used to encode meta strings such class name and field names.
Meta string is mainly used to encode meta strings such as class name and field names.

### Encoding Algorithms

Expand Down
2 changes: 0 additions & 2 deletions docs/protocols/xlang_object_graph.md

This file was deleted.

Loading

0 comments on commit c4b4f38

Please sign in to comment.