-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java][Protocol] Chunk by chunk predictive map serialization protocol #925
Labels
Comments
chaokunyang
changed the title
[Java][Protocol] Optimize Container serialization
[Java][Protocol] Chunk by chunk predictive map serialization protocol
Dec 22, 2023
map iteration and serialization benchmark: @Benchmark
public Object computeMapHeader() {
boolean containKeyNull = false;
boolean containValueNull = false;
Class<?> keyClass = null, valueClass = null;
boolean keySameClass = true;
boolean valueSameClass = true;
int count = 0;
for (Map.Entry<Object, Object> entry : mapForIterating.entrySet()) {
Object key = entry.getKey();
Object value = entry.getValue();
count++;
if (key == null) {
containKeyNull = true;
} else if (keyClass == null) {
keyClass = key.getClass();
} else if (keyClass != key.getClass()) {
keySameClass = false;
}
if (value == null) {
containValueNull = true;
} else if (valueClass == null) {
valueClass = value.getClass();
} else if (valueClass != value.getClass()) {
valueSameClass = false;
}
}
return new boolean[] {containKeyNull, keySameClass, containValueNull, valueSameClass, count % 2 != 0};
}
@Benchmark
public Object iterateMap() {
int count = 0;
for (Map.Entry<Object, Object> entry : mapForIterating.entrySet()) {
Object key = entry.getKey();
Object value = entry.getValue();
// hole.consume(key);
// hole.consume(value);
count++;
}
return count;
}
@Benchmark
public Object serializeMap() {
buffer.writerIndex(0);
for (Map.Entry<Object, Object> entry : mapForIterating.entrySet()) {
Object key = entry.getKey();
Object value = entry.getValue();
fury.writeRef(buffer, key);
fury.writeRef(buffer, value);
}
return buffer;
}
@Benchmark
public Object serializeMap2() {
buffer.writerIndex(0);
Map<Integer, Integer> map = (Map) mapForIterating;
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
Integer key = entry.getKey();
Integer value = entry.getValue();
buffer.writeInt(key);
buffer.writeInt(value);
}
return buffer;
}
@Benchmark
public Object serialize() {
return Tuple2.of(computeMapHeader(), serializeMap2());
}
@Benchmark
public Object serializeOpt() {
buffer.writerIndex(0);
Map<Integer, Integer> map = (Map) mapForIterating;
boolean containKeyNull = false;
boolean containValueNull = false;
Class<?> keyClass = null, valueClass = null;
boolean keySameClass = true;
boolean valueSameClass = true;
int count = 0;
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
Integer key = entry.getKey();
Integer value = entry.getValue();
count++;
if (key == null) {
containKeyNull = true;
} else if (keyClass == null) {
keyClass = key.getClass();
} else if (keyClass != key.getClass()) {
keySameClass = false;
}
if (value == null) {
containValueNull = true;
} else if (valueClass == null) {
valueClass = value.getClass();
} else if (valueClass != value.getClass()) {
valueSameClass = false;
}
buffer.writeInt(key);
buffer.writeInt(value);
}
return Tuple2.of(new boolean[] {containKeyNull, keySameClass, containValueNull, valueSameClass, count % 2 != 0}, buffer);
} Iteration is almost slow as write data for a map with size 100.
|
This was referenced Dec 22, 2023
Open
chaokunyang
added a commit
that referenced
this issue
Feb 28, 2024
<!-- Thank you for your contribution! Please review https://github.com/alipay/fury/blob/main/CONTRIBUTING.rst before opening a pull request. --> ## What do these changes do? This PR refine fury java serialization format spec. The cross-language object graph serialization spec is similar and will be added in a later PR, but it needs more discuss. This PR added some new spec which hasn't been implemented in current java implementation: - chunk-by-chunk predictive map serialization: #925 - layed class meta - new class meta encoding - #1229 - object serialization with schema evolution support by auto meta share. Some parts has been omitted in this spec: - object serialization with schema evolution support by write field in a KV like pattern: this will be replaced by schema evolution mode described in this spec in the future. Currently fury doesn't provide binary compatibility, the spec may be revised in the future. <!-- Please give a short brief about these changes. --> ## Related issue number <!-- Are there any issues opened that will be resolved by merging this change? --> Closes #1239 #1238 ## Check code requirements - [ ] tests added / passed (if needed) - [ ] Ensure all linting tests pass, see [here](https://github.com/alipay/fury/blob/main/CONTRIBUTING.rst) for how to run them --------- Co-authored-by: Twice <[email protected]>
i am interesting in this issue. I have some question :
|
public class Struct {
@MapFieldInfo(keyNullable=false, valueNullable=false);
Map<String, Integer> map;
}
|
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
Optimize Collection/Map serialization by potential homogenization in elements:
By using those information, the serialization performance can be enhanced and the size of serialized binary can be smaller.
For collection, we can compute header before serializing elements, since iterating collection is cheap. But for map iteration, it's expensive, it takes same cost as serialization for
Map<Integer, Integer>
.We need to finish kv writing and header writing in one-round iteration.
Describe the solution you'd like
Users can use
MapFieldInfo
annotation to provide header in advance. Otherwise Fury will use first key-value pair topredict header optimistically, and update the chunk header if predict failed at some pair.
Fury will serialize map chunk by chunk, every chunk
has 127 pairs at most.
KV header:
0b1
of header to flag it.0b10
of header to flag it. If ref tracking is enabled for thiskey type, this flag is invalid.
0b100
of header to flag it.0b1000
of header to flag it.0b10000
of header to flag it.0b100000
of header to flag it. If ref tracking is enabled for thisvalue type, this flag is invalid.
0b1000000
of header to flag it.0b10000000
of header to flag it.If streaming write is enabled, which means Fury can't update written
chunk size
. In such cases, map key-value dataformat will be:
KV header
will be header marked byMapFieldInfo
in java. For languages such as golang, this can be computed inadvance for non-interface type mostly.
Additional context
#923
The text was updated successfully, but these errors were encountered: