Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scala] support default not-null value in COMPATIBLE mode. #1683

Open
LoranceChen opened this issue Jun 12, 2024 · 6 comments
Open

[Scala] support default not-null value in COMPATIBLE mode. #1683

LoranceChen opened this issue Jun 12, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@LoranceChen
Copy link

LoranceChen commented Jun 12, 2024

Currently, for scala, add new field and deserialize from old binary data will get a null as new field value. But avoid null is a good practice in scala language.

  val personBytes = readBytesFromFile("person_v1") // there are old version `Person(1,true,some text)`
  //  append field as new person: case class Person(a: Int, b: Boolean, c: String, d: String = "default d")
  val deserPerson = fury.deserializeJavaObject(personBytes, classOf[Person])
  println(s"deserPerson: ${deserPerson}") // deserPerson: Person(1,true,some text,null)

I think it's better using default value our a empty value to set the new field. Such as give the result:

// deserPerson: Person(1,true,some text, "default d")

And if there not a default value in the field define, can give a empty value. For String is "" will be better for null.

If the new field is a structure, can using a default value to instance this one.
case class Foo(a: String, b: Int) can setting the default value as Foo("", 0)

However, for some performance care scenario. Using null should be better and handle by developer.

I'm advice add a new configuration to decide using the default value or null for new field.

@LoranceChen LoranceChen added the enhancement New feature or request label Jun 12, 2024
@chaokunyang
Copy link
Collaborator

chaokunyang commented Jun 16, 2024

Hi @LoranceChen , thanks for bring this up. It's very necessary to support this in Apache Fury.

Scala didn't provide a method to construct object with default value at bytecode level. It generate bytecode to invoke constructor with all parameters provided, and default params are provided at callsite.

If we need to provide default value when creating object, we need to extract the default value. Fortunately, scala generate a
method like SomeClass$.apply$default$2:()I:

case class SomeClass(v: List[IdAnyVal], x:Int=1)

// Callsite bytecode
      34: getstatic     #131                // Field org/apache/fury/serializer/SomeClass$.MODULE$:Lorg/apache/fury/serializer/SomeClass$;
      37: invokevirtual #135                // Method org/apache/fury/serializer/SomeClass$.apply$default$2:()I
      40: invokespecial #138                // Method org/apache/fury/serializer/SomeClass."<init>":(Lscala/collection/immutable/List;I)V
      43: putstatic     #87                 // Field p:Lorg/apache/fury/serializer/SomeClass;
      46: getstatic     #143                // Field scala/Predef$.MODULE$:Lscala/Predef$;

We may can detect whether such method exists to know which parameter has default value, and provide it as default value when constructing object. This will take some horse work. We don't have time for this currently. Would you like to contribute to this? The record contructor in Fury org.apache.fury.builder.ObjectCodecBuilder#createRecord/org.apache.fury.serializer.ObjectSerializer#read can be taken as an example.

@LoranceChen
Copy link
Author

Hi, great to see can solve it.
Glade to take a PR if possiable and I need sometime to familiar with the repository.

@chaokunyang
Copy link
Collaborator

If the field doesn't exist in serialization process, but does exist in deserialization process, we can invoke method like SomeClass$.apply$default$2:()I to get default value for such field, and set it the object

@LoranceChen
Copy link
Author

Hi, @chaokunyang , do you some advice to debug the codegen init process?
The generated code seems not easy to trace the logic where it is.
image

Thanks

@chaokunyang
Copy link
Collaborator

You can configure FURY_CODE_DIR environment variable to set generated code dir, if you set it to src directory, then you can debug it in IDE when rerun the code

@chaokunyang
Copy link
Collaborator

Hi @LoranceChen , are you still working on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants