Skip to content

Commit

Permalink
[Java] Add SLI(small long as int) long encoding algorithm (#942)
Browse files Browse the repository at this point in the history
* design and implement SLI long encoding

* add long encoding option

* add long encoding config

* refine testWriteSliLong

* add long encoding config

* using SLI encoding for fury jit/interpreter mode serialization

* fix tests
  • Loading branch information
chaokunyang authored Oct 5, 2023
1 parent 7ee2bd9 commit 96ee19f
Show file tree
Hide file tree
Showing 17 changed files with 319 additions and 95 deletions.
19 changes: 16 additions & 3 deletions docs/guide/java_object_graph_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,25 @@ ThreadSafeFury fury=Fury.builder()

### Smaller size
`FuryBuilder#withIntCompressed`/`FuryBuilder#withLongCompressed` can be used to compress int/long for smaller size.
Normally compress int is enough. If a number are `long` type, it can't be represented by smaller bytes mostly,
the compression won't get good enough result, not worthy compared to performance cost.
Normally compress int is enough.

Both compression are enabled by default, if the serialized is not important, for example, you use flatbuffers for
serialization before, which doesn't compress anything, then you should disable compression. If your data are all numbers,
the compression can bring 80% performance regression.
the compression may bring 80% performance regression.

For int compression, fury use 1~5 bytes for encoding. First bit in every byte indicate whether has next byte. if first bit is set, then next byte will be read util first bit of next byte is unset.

For long compression, fury support two encoding:
- Fury SLI(Small long as int) Encoding (**used by default**):
- If long is in [-1073741824, 1073741823], encode as 4 bytes int: `| little-endian: ((int) value) << 1 |`
- Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |`
- Fury PVL(Progressive Variable-length Long) Encoding:
- First bit in every byte indicate whether has next byte. if first bit is set, then next byte will be read util first bit of next byte is unset.
- Negative number will be converted to positive number by ` (v << 1) ^ (v >> 63)` to reduce cost of small negative numbers.

If a number are `long` type, it can't be represented by smaller bytes mostly, the compression won't get good enough result,
not worthy compared to performance cost. Maybe you should try to disable long compression if you find it didn't bring much
space savings.

### Implement a customized serializer
In some cases, you may want to implement a serializer for your type, especially some class customize serialization by JDK
Expand Down
10 changes: 7 additions & 3 deletions docs/protocols/java_object_graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,17 @@ The data are serialized using little endian order overall.

### int
- size: 1~5 byte
- positive int format: first bit in every byte indicate whether has next byte. if first bit is set i.e. `b & 0x80 == 0x80`, then next byte should be read util first bit is unset.
- positive int format: first bit in every byte indicate whether has next byte. if first bit is set i.e. `b & 0x80 == 0x80`, then next byte should be read util first bit of next byte is unset.
- Negative number will be converted to positive number by ` (v << 1) ^ (v >> 31)` to reduce cost of small negative numbers.

### long
- size: 1~9 byte
- positive long format: first bit in every byte indicate whether has next byte. if first bit is set i.e. `b & 0x80 == 0x80`, then next byte should be read util first bit is unset.
- Negative number will be converted to positive number by ` (v << 1) ^ (v >> 63)` to reduce cost of small negative numbers.
- Fury SLI(Small long as int) Encoding:
- If long is in [-1073741824, 1073741823], encode as 4 bytes int: `| little-endian: ((int) value) << 1 |`
- Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |`
- Fury PVL(Progressive Variable-length Long) Encoding:
- positive long format: first bit in every byte indicate whether has next byte. if first bit is set i.e. `b & 0x80 == 0x80`, then next byte should be read util first bit is unset.
- Negative number will be converted to positive number by ` (v << 1) ^ (v >> 63)` to reduce cost of small negative numbers.

### float
- size: 4 byte
Expand Down
32 changes: 19 additions & 13 deletions java/fury-core/src/main/java/io/fury/Fury.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import io.fury.config.Config;
import io.fury.config.FuryBuilder;
import io.fury.config.Language;
import io.fury.config.LongEncoding;
import io.fury.memory.MemoryBuffer;
import io.fury.memory.MemoryUtils;
import io.fury.resolver.ClassInfo;
Expand All @@ -36,6 +37,7 @@
import io.fury.serializer.BufferCallback;
import io.fury.serializer.BufferObject;
import io.fury.serializer.OpaqueObjects;
import io.fury.serializer.PrimitiveSerializers.LongSerializer;
import io.fury.serializer.Serializer;
import io.fury.serializer.SerializerFactory;
import io.fury.serializer.StringSerializer;
Expand Down Expand Up @@ -100,7 +102,7 @@ public final class Fury {
private final StringSerializer stringSerializer;
private final Language language;
private final boolean compressInt;
private final boolean compressLong;
private final LongEncoding longEncoding;
private final Generics generics;
private Language peerLanguage;
private BufferCallback bufferCallback;
Expand All @@ -115,7 +117,7 @@ public Fury(FuryBuilder builder, ClassLoader classLoader) {
this.language = config.getLanguage();
this.refTracking = config.trackingRef();
compressInt = config.compressInt();
compressLong = config.compressLong();
longEncoding = config.longEncoding();
if (refTracking) {
this.refResolver = new MapRefResolver();
} else {
Expand Down Expand Up @@ -476,11 +478,7 @@ private void writeData(MemoryBuffer buffer, ClassInfo classInfo, Object obj) {
buffer.writeFloat((Float) obj);
break;
case ClassResolver.LONG_CLASS_ID:
if (compressLong) {
buffer.writeVarLong((Long) obj);
} else {
buffer.writeLong((Long) obj);
}
LongSerializer.writeLong(buffer, (Long) obj, longEncoding);
break;
case ClassResolver.DOUBLE_CLASS_ID:
buffer.writeDouble((Double) obj);
Expand Down Expand Up @@ -611,6 +609,14 @@ public String readJavaString(MemoryBuffer buffer) {
return stringSerializer.readJavaString(buffer);
}

public void writeLong(MemoryBuffer buffer, long value) {
LongSerializer.writeLong(buffer, value, longEncoding);
}

public long readLong(MemoryBuffer buffer) {
return LongSerializer.readLong(buffer, longEncoding);
}

/** Deserialize <code>obj</code> from a byte array. */
public Object deserialize(byte[] bytes) {
return deserialize(MemoryUtils.wrap(bytes), null);
Expand Down Expand Up @@ -827,11 +833,7 @@ private Object readDataInternal(MemoryBuffer buffer, ClassInfo classInfo) {
case ClassResolver.FLOAT_CLASS_ID:
return buffer.readFloat();
case ClassResolver.LONG_CLASS_ID:
if (compressLong) {
return buffer.readVarLong();
} else {
return buffer.readLong();
}
return LongSerializer.readLong(buffer, longEncoding);
case ClassResolver.DOUBLE_CLASS_ID:
return buffer.readDouble();
case ClassResolver.STRING_CLASS_ID:
Expand Down Expand Up @@ -1268,8 +1270,12 @@ public boolean compressInt() {
return compressInt;
}

public LongEncoding longEncoding() {
return longEncoding;
}

public boolean compressLong() {
return compressLong;
return config.compressLong();
}

public static FuryBuilder builder() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@
import static io.fury.type.TypeUtils.PRIMITIVE_DOUBLE_TYPE;
import static io.fury.type.TypeUtils.PRIMITIVE_FLOAT_TYPE;
import static io.fury.type.TypeUtils.PRIMITIVE_INT_TYPE;
import static io.fury.type.TypeUtils.PRIMITIVE_LONG_TYPE;
import static io.fury.type.TypeUtils.PRIMITIVE_SHORT_TYPE;
import static io.fury.type.TypeUtils.PRIMITIVE_VOID_TYPE;
import static io.fury.type.TypeUtils.SET_TYPE;
Expand Down Expand Up @@ -79,6 +78,7 @@
import io.fury.serializer.CompatibleSerializer;
import io.fury.serializer.MapSerializers.MapSerializer;
import io.fury.serializer.ObjectSerializer;
import io.fury.serializer.PrimitiveSerializers.LongSerializer;
import io.fury.serializer.Serializer;
import io.fury.serializer.Serializers;
import io.fury.serializer.StringSerializer;
Expand Down Expand Up @@ -122,6 +122,7 @@ public abstract class BaseObjectCodecBuilder extends CodecBuilder {
protected final Reference classResolverRef =
fieldRef(CLASS_RESOLVER_NAME, CLASS_RESOLVER_TYPE_TOKEN);
protected final Fury fury;
protected final ClassResolver classResolver;
protected final Reference stringSerializerRef;
private final Map<Class<?>, Reference> serializerMap = new HashMap<>();
private final Map<String, Object> sharedFieldMap = new HashMap<>();
Expand All @@ -132,6 +133,7 @@ public abstract class BaseObjectCodecBuilder extends CodecBuilder {
public BaseObjectCodecBuilder(TypeToken<?> beanType, Fury fury, Class<?> parentSerializerClass) {
super(new CodegenContext(), beanType);
this.fury = fury;
this.classResolver = fury.getClassResolver();
this.parentSerializerClass = parentSerializerClass;
addCommonImports();
ctx.reserveName(REF_RESOLVER_NAME);
Expand Down Expand Up @@ -371,8 +373,7 @@ private Expression serializeForNotNull(
String func = fury.compressInt() ? "writeVarInt" : "writeInt";
return new Invoke(buffer, func, inputObject);
} else if (clz == long.class || clz == Long.class) {
String func = fury.compressLong() ? "writeVarLong" : "writeLong";
return new Invoke(buffer, func, inputObject);
return LongSerializer.writeLong(buffer, inputObject, fury.longEncoding(), true);
} else if (clz == float.class || clz == Float.class) {
return new Invoke(buffer, "writeFloat", inputObject);
} else if (clz == double.class || clz == Double.class) {
Expand Down Expand Up @@ -1159,8 +1160,7 @@ protected Expression deserializeForNotNull(
String func = fury.compressInt() ? "readVarInt" : "readInt";
return new Invoke(buffer, func, PRIMITIVE_INT_TYPE);
} else if (cls == long.class || cls == Long.class) {
String func = fury.compressLong() ? "readVarLong" : "readLong";
return new Invoke(buffer, func, PRIMITIVE_LONG_TYPE);
return LongSerializer.readLong(buffer, fury.longEncoding());
} else if (cls == float.class || cls == Float.class) {
return new Invoke(buffer, "readFloat", PRIMITIVE_FLOAT_TYPE);
} else if (cls == double.class || cls == Double.class) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
import io.fury.codegen.Expression.StaticInvoke;
import io.fury.codegen.ExpressionVisitor;
import io.fury.serializer.ObjectSerializer;
import io.fury.serializer.PrimitiveSerializers.LongSerializer;
import io.fury.type.Descriptor;
import io.fury.type.DescriptorGrouper;
import io.fury.util.Platform;
Expand Down Expand Up @@ -86,7 +87,7 @@ public class ObjectCodecBuilder extends BaseObjectCodecBuilder {
public ObjectCodecBuilder(Class<?> beanClass, Fury fury) {
super(TypeToken.of(beanClass), fury, Generated.GeneratedObjectSerializer.class);
Collection<Descriptor> descriptors =
fury.getClassResolver().getAllDescriptorsMap(beanClass, true).values();
classResolver.getAllDescriptorsMap(beanClass, true).values();
classVersionHash =
new Literal(ObjectSerializer.computeVersionHash(descriptors), PRIMITIVE_INT_TYPE);
DescriptorGrouper grouper =
Expand Down Expand Up @@ -352,7 +353,8 @@ private List<Expression> serializePrimitivesCompressed(
addIncWriterIndexExpr(groupExpressions, buffer, acc);
compressStarted = true;
}
groupExpressions.add(new Invoke(buffer, "unsafeWriteVarLong", fieldValue));
groupExpressions.add(
LongSerializer.writeLong(buffer, fieldValue, fury.longEncoding(), false));
}
} else {
throw new IllegalStateException("impossible");
Expand Down Expand Up @@ -695,7 +697,7 @@ private List<Expression> deserializeCompressedPrimitives(
compressStarted = true;
addIncReaderIndexExpr(groupExpressions, buffer, acc);
}
fieldValue = new Invoke(buffer, "readVarLong", PRIMITIVE_LONG_TYPE);
fieldValue = LongSerializer.readLong(buffer, fury.longEncoding());
}
} else {
throw new IllegalStateException("impossible");
Expand Down
9 changes: 8 additions & 1 deletion java/fury-core/src/main/java/io/fury/config/Config.java
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ public class Config implements Serializable {
private final boolean compressString;
private final boolean compressInt;
private final boolean compressLong;
private final LongEncoding longEncoding;
private final boolean requireClassRegistration;
private final boolean registerGuavaTypes;
private final boolean shareMetaContext;
Expand All @@ -61,7 +62,8 @@ public Config(FuryBuilder builder) {
timeRefIgnored = !trackingRef || builder.timeRefIgnored;
compressString = builder.compressString;
compressInt = builder.compressInt;
compressLong = builder.compressLong;
longEncoding = builder.longEncoding;
compressLong = longEncoding != LongEncoding.LE_RAW_BYTES;
requireClassRegistration = builder.requireClassRegistration;
registerGuavaTypes = builder.registerGuavaTypes;
codeGenEnabled = builder.codeGenEnabled;
Expand Down Expand Up @@ -137,6 +139,11 @@ public boolean compressLong() {
return compressLong;
}

/** Returns long encoding. */
public LongEncoding longEncoding() {
return longEncoding;
}

public boolean requireClassRegistration() {
return requireClassRegistration;
}
Expand Down
17 changes: 13 additions & 4 deletions java/fury-core/src/main/java/io/fury/config/FuryBuilder.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import io.fury.serializer.TimeSerializers;
import io.fury.util.LoggerFactory;
import io.fury.util.Platform;
import java.util.Objects;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;

Expand Down Expand Up @@ -60,7 +61,7 @@ public final class FuryBuilder {
boolean timeRefIgnored = true;
ClassLoader classLoader;
boolean compressInt = true;
boolean compressLong = false;
public LongEncoding longEncoding = LongEncoding.SLI;
boolean compressString = true;
CompatibleMode compatibleMode = CompatibleMode.SCHEMA_CONSISTENT;
boolean checkJdkClassSerializable = true;
Expand Down Expand Up @@ -115,7 +116,7 @@ public FuryBuilder ignoreTimeRef(boolean ignoreTimeRef) {
/** Use variable length encoding for int/long. */
public FuryBuilder withNumberCompressed(boolean numberCompressed) {
this.compressInt = numberCompressed;
this.compressLong = numberCompressed;
withLongCompressed(numberCompressed);
return this;
}

Expand All @@ -125,9 +126,17 @@ public FuryBuilder withIntCompressed(boolean intCompressed) {
return this;
}

/** Use variable length encoding for long. */
/**
* Use variable length encoding for long. Enabled by default, use {@link LongEncoding#SLI} (Small
* long as int) for long encoding.
*/
public FuryBuilder withLongCompressed(boolean longCompressed) {
this.compressLong = longCompressed;
return withLongCompressed(longCompressed ? LongEncoding.SLI : LongEncoding.LE_RAW_BYTES);
}

/** Use variable length encoding for long. */
public FuryBuilder withLongCompressed(LongEncoding longEncoding) {
this.longEncoding = Objects.requireNonNull(longEncoding);
return this;
}

Expand Down
45 changes: 45 additions & 0 deletions java/fury-core/src/main/java/io/fury/config/LongEncoding.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
* Copyright 2023 The Fury Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.fury.config;

/**
* Encoding option for long. Default encoding is fury SLI(Small long as int) encoding: {@link #SLI}.
*
* @author chaokunyang
*/
public enum LongEncoding {
/**
* Fury SLI(Small long as int) Encoding:
* <li>If long is in [0xc0000000, 0x3fffffff], encode as 4 bytes int: `| little-endian: ((int)
* value) << 1 |`
* <li>Otherwise write as 9 bytes: `| 0b1 | little-endian 8bytes long |`.
*
* <p>Faster than {@link #PVL}, but compression is not good as {@link #PVL} such as for ints
* in short range.
*/
SLI,
/**
* Fury Progressive Variable-length Long Encoding:
* <li>positive long format: first bit in every byte indicate whether has next byte, then next
* byte should be read util first bit is unset.
* <li>Negative number will be converted to positive number by ` (v << 1) ^ (v >> 63)` to reduce
* cost of small negative numbers.
*/
PVL,
/** Write long as little endian 8bytes, no compression. */
LE_RAW_BYTES,
}
Loading

0 comments on commit 96ee19f

Please sign in to comment.