You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently when a class has writeObject/readObject/readObjectNoData method defined, fury will use JDK to serialize those objects and objects contained in this subgraph, which is slow and insecure.
JDK serialization will escape from Fury blacklist check
If the contained objects is serialized by fury already, JDK will serialize it again, the deserialized object won't be the expected. The worse cast is that there will be recusion error if the contained objects has reference to outer objects.
The jdk serialization is very slow, and take much more spaces.
Describe the solution you'd like
Implement a new serializer in fury, compatible all JDK API, ensure all methods defined in the user class are executed by fury.
JDK serialization is done using ObjectOutputStream and ObjectInputStream. This framework allows usersExternalizable/writeObject/readObject/readObjectNoData/writeReplace/readResolveMethods such as custom serialization behavior. When the object to be serialized does not contain these methods, ObjectOutputStream calls the internal defaultWriteObject to serialize all fields and type information of the type hierarchy. During deserialization, ObjectInputStream is used to read each type information and corresponding field values of the type hierarchy and fill the entire object. If you include custom serialization methods, you need to go to a separate execution process.
Serialization overall process
When the object is definedwriteReplaceMethod, serialization calls the method first, and then uses the object reference returned by the method to replace the reference of the record before referencing the table. If the returned object type remains unchanged, the returned object type still exists.writeReplaceMETHOD. This method is ignored and enters the normal writeObject/writeExternal process. If the return type changes, the loop callwriteReplaceThe method repeats the preceding process.
When the returned object no longer containswriteReplaceMethod, this time the field data serialization process is entered, if the object is implementedExternalizableInterface, writeExternal is called for serialization. Otherwise, each type and all field data belonging to the current type are serialized in sequence starting from the first parent class in the object hierarchy that defines Serializable.
When a type of the object hierarchy is definedwriteObjectMethod, for serialization of fields corresponding to this type, the writeObject method defined by this type is called. writeObject method can call the ObjectOutputStream of the defaultWriteObject to complete serialization of the default field, or completely handwritten serialization logic.
If the fields of different JDK versions are inconsistent and need to be compatible, you need to call the putFields method to obtain the PutField object, which is used to set the field data that exists only in some JDK versions but does not exist in the current JDK Version, and then call writeFields to write the field data.
For example, ThreadLocalRandom uses putFields to customize serialization logic:
Deserialization reads the object type first, and then queries the parameterless constructor of the type to create the object. If no parameterless constructor exists, use ReflectionFactory#newConstructorForSerialization(java.lang.Class <?>) traverse the type hierarchy upward until the unparametric constructor of the first non-Serializable parent class is obtained (this process is cached to avoid repeated searches).
Then, create an object based on the constructor and put the object into the reference table to avoid that the object cannot be found by circular reference.
Next, deserialize each type and corresponding field data from the first Serializable parent class and fill it into the objects previously created by the constructor. If a deserialized type does not exist, the object hierarchy changes. A new parent class is added to the deserialized object. If the type is definedreadObjectNoDataMethod, this method is called to initialize the field status, otherwise this part of the field will be in the default state.
If the parent class type does not define readObject, the defaultReadObject is called to read the values of each non-transient non-static field in turn and fill them in the object. If a readObject method is defined, the method is called to deserialize the data of this type.
readObject method can call defaultReadObject to deserialize the default field values, then execute other custom logic, or completely write deserialization logic.
If the fields of different JDK versions are inconsistent and need to be compatible, you need to call the readFields method to obtain the GetField object. The object may contain field data that is not available in the current Class version. In this case, you can directly ignore it. Other fields can be queried from GetField and set to the object. Note that defaultReadObject and readFields can only call one.
In some cases, the deserialization of the parent class field depends on the deserialization status of the subclass field. Because the parent class field is deserialized first, the deserialization status of the subclass cannot be obtained at this time, so JDK providesregisterValidationThe callback is executed after the entire object is deserialized. In this case, additional operations can be performed to restore the state of the object.
After the object is serialized, check whether the type of the object is defined.readResolveMETHOD. If the method is defined, the method is called to return an alternative Object. If the return type changes, the method is called in a loop.readResolveThe method repeats the preceding process.
After readResolve is executed, the entire object is deserialized.
Fury serialization
Serialization execution
Write all Serializable classes
Traverse the object class hierarchy and serialize field data of each type in turn. Serializing data of each type is divided into the following sections:
If no writeObject method is defined for the type of the current object, slotsSerializer (JITCompatibleSerializer) is called to serialize all fields of the current type.
If the type of the previous object defines writeObject methods, the context of the previous serialization is cached, then the writeObject method is called, and the FuryObjectOutputStream implemented by Fury is passed in.
In FuryObjectOutputStream, special processing is also performed for putFields/writeFields/defaultWriteObject. putFields/writeFields converts the object into an array recognized by CompatibleSerializer, and defaultWriteObject directly calls slotsSerializer (JITCompatibleSerializer) to serialize all fields of the current type.
Deserialization execution
Create an object instance based on the constructor.
Writes an object instance to a reference table.
Read all Serializable classes in the object hierarchy.
The class is read from the data in sequence and compared with the class of the current type hierarchy. If the type is inconsistent, it indicates that the current type hierarchy has changed and a new parent class has been introduced. If the type defines readObjectNoData, the method is called for initialization, and then the type hierarchy is traversed upward until the same type is found.
Deserialize all field values of this type and set them to object fields.
If the object does not define readObject methods, slotsSerializer (JITCompatibleSerializer) is called for deserialization.
If readObject method is defined, the readObject method of the object is called and the FuryObjectInputStream implemented by Fury is passed in.
In FuryObjectInputStream, special processing is also performed for readFields/defaultReadObject. readFields converts an object into a recognizable CompatibleSerializer by using GetField, and defaultReadObject directly calls slotsSerializer (JITCompatibleSerializer) to deserialize all fields of the current type.
If the user registers readObject callbacks through registerValidation during ObjectInputValidation, the callbacks are executed in sequence according to the priority before the object is returned.
At this point, deserialization is complete.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently when a class has
writeObject/readObject/readObjectNoData
method defined, fury will use JDK to serialize those objects and objects contained in this subgraph, which is slow and insecure.Describe the solution you'd like
Implement a new serializer in fury, compatible all JDK API, ensure all methods defined in the user class are executed by fury.
readResolve
/writeReplace
has been handled in #193JDK serialization ratiaonal
JDK serialization is done using ObjectOutputStream and ObjectInputStream. This framework allows usersExternalizable/writeObject/readObject/readObjectNoData/writeReplace/readResolveMethods such as custom serialization behavior. When the object to be serialized does not contain these methods, ObjectOutputStream calls the internal defaultWriteObject to serialize all fields and type information of the type hierarchy. During deserialization, ObjectInputStream is used to read each type information and corresponding field values of the type hierarchy and fill the entire object. If you include custom serialization methods, you need to go to a separate execution process.
Serialization overall process
When the object is definedwriteReplaceMethod, serialization calls the method first, and then uses the object reference returned by the method to replace the reference of the record before referencing the table. If the returned object type remains unchanged, the returned object type still exists.writeReplaceMETHOD. This method is ignored and enters the normal writeObject/writeExternal process. If the return type changes, the loop callwriteReplaceThe method repeats the preceding process.
When the returned object no longer containswriteReplaceMethod, this time the field data serialization process is entered, if the object is implementedExternalizableInterface, writeExternal is called for serialization. Otherwise, each type and all field data belonging to the current type are serialized in sequence starting from the first parent class in the object hierarchy that defines Serializable.
When a type of the object hierarchy is definedwriteObjectMethod, for serialization of fields corresponding to this type, the writeObject method defined by this type is called. writeObject method can call the ObjectOutputStream of the defaultWriteObject to complete serialization of the default field, or completely handwritten serialization logic.
If the fields of different JDK versions are inconsistent and need to be compatible, you need to call the putFields method to obtain the PutField object, which is used to set the field data that exists only in some JDK versions but does not exist in the current JDK Version, and then call writeFields to write the field data.
For example, ThreadLocalRandom uses putFields to customize serialization logic:
Deserialization process
Deserialization reads the object type first, and then queries the parameterless constructor of the type to create the object. If no parameterless constructor exists, use ReflectionFactory#newConstructorForSerialization(java.lang.Class <?>) traverse the type hierarchy upward until the unparametric constructor of the first non-Serializable parent class is obtained (this process is cached to avoid repeated searches).
Then, create an object based on the constructor and put the object into the reference table to avoid that the object cannot be found by circular reference.
Next, deserialize each type and corresponding field data from the first Serializable parent class and fill it into the objects previously created by the constructor. If a deserialized type does not exist, the object hierarchy changes. A new parent class is added to the deserialized object. If the type is definedreadObjectNoDataMethod, this method is called to initialize the field status, otherwise this part of the field will be in the default state.
If the parent class type does not define readObject, the defaultReadObject is called to read the values of each non-transient non-static field in turn and fill them in the object. If a readObject method is defined, the method is called to deserialize the data of this type.
readObject method can call defaultReadObject to deserialize the default field values, then execute other custom logic, or completely write deserialization logic.
If the fields of different JDK versions are inconsistent and need to be compatible, you need to call the readFields method to obtain the GetField object. The object may contain field data that is not available in the current Class version. In this case, you can directly ignore it. Other fields can be queried from GetField and set to the object. Note that defaultReadObject and readFields can only call one.
In some cases, the deserialization of the parent class field depends on the deserialization status of the subclass field. Because the parent class field is deserialized first, the deserialization status of the subclass cannot be obtained at this time, so JDK providesregisterValidationThe callback is executed after the entire object is deserialized. In this case, additional operations can be performed to restore the state of the object.
After the object is serialized, check whether the type of the object is defined.readResolveMETHOD. If the method is defined, the method is called to return an alternative Object. If the return type changes, the method is called in a loop.readResolveThe method repeats the preceding process.
After readResolve is executed, the entire object is deserialized.
Fury serialization
Serialization execution
Deserialization execution
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: