Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Polymorphic Serialization #30

Closed
Vampire opened this issue Aug 23, 2020 · 17 comments
Closed

Custom Polymorphic Serialization #30

Vampire opened this issue Aug 23, 2020 · 17 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@Vampire
Copy link
Contributor

Vampire commented Aug 23, 2020

Didn't want to pollute the #29 feature request, so I'm opening a new question (or bug) issue. :-)

I tried to make a custom serializer as you suggested, following the official example https://github.com/Kotlin/kotlinx.serialization/blob/v0.20.0/runtime/commonTest/src/kotlinx/serialization/json/JsonTreeAndMapperTest.kt#L35 that is linked to from the docs about custom serializers.

In the GitHub action yml file you can have

outputs:
    foo:
        description: bar
        value: baz

runs:
    using: composite

or

outputs:
    foo:
        description: bar

runs:
    using: something else

Meaning the output has a mandatory value property if it is a composite action but no value property if it is a normal action.

Previously I didn't support composite, so I had

@Serializable
data class GitHubAction(
        /* ... */
        val outputs: Map<String, Output>? = null,
        /* ... */
) {
    /* ... */
    @Serializable
    data class Output(
            val description: String
    )
    /* ... */
}

I now changed it to

@Serializable
data class GitHubAction(
        /* ... */
        val outputs: Map<String, Output>? = null,
        /* ... */
) {
    /* ... */
    sealed class Output {
        abstract val description: String

        data class NormalOutput(
                override val description: String
        ) : Output()

        data class CompositeOutput(
                override val description: String,
                val value: String
        ) : Output()

        @Serializer(forClass = Output::class)
        companion object : KSerializer<Output> {
            override val descriptor: SerialDescriptor = SerialDescriptor("net.kautler.dao.GitHubAction.Output", SEALED) {
                element("normal", SerialDescriptor("net.kautler.dao.GitHubAction.Output.NormalOutput") {
                    element("description", PrimitiveDescriptor("net.kautler.dao.GitHubAction.Output.NormalOutput.description", STRING))
                })
                element("composite", SerialDescriptor("net.kautler.dao.GitHubAction.Output.CompositeOutput") {
                    element("description", PrimitiveDescriptor("net.kautler.dao.GitHubAction.Output.CompositeOutput.description", STRING))
                    element("value", PrimitiveDescriptor("net.kautler.dao.GitHubAction.Output.CompositeOutput.value", STRING))
                })
            }

            override fun deserialize(decoder: Decoder): Output {
                TODO("deserialize: Not yet implemented")
            }

            override fun serialize(encoder: Encoder, value: Output) {
                TODO("serialize: Not yet implemented")
            }
        }
    }
    /* ... */
}

The problem is, that it seems to not being used.
When I had a TODO() in the descriptor it failed accordingly.
But when I now try to deserialize an action file, I get complaint from kaml about missing type tag instead of the custom serializer being used.
Do I do something wrong or is there a bug in kaml?

@charleskorn
Copy link
Owner

Hey @Vampire, I suspect the issue is the use of SEALED in descriptor - try with CONTEXTUAL instead.

SEALED indicates a sealed class, and so kaml expects to find a type tag so it can do polymorphic deserialization.

@charleskorn charleskorn added the question Further information is requested label Aug 23, 2020
@Vampire
Copy link
Contributor Author

Vampire commented Aug 23, 2020

Hm, yeah, that works indeed.
What are the implications of using CONTEXTUAL instead of SEALED?
That SEALED indicates a sealed class is correct imho, it is a sealed class, just with a custom serializer.
Even forcing the serializer with

@Serializable
data class GitHubAction(
        /* ... */
        val outputs: Map<String, @Serializable(with = Output.Companion::class) Output>? = null,
        /* ... */
) {
    /* ... */
}

does not work, so I'd say this is definitely a kaml bug.

@charleskorn
Copy link
Owner

This is what I understand them to mean:

SEALED means "this is a sealed class using polymorphic serialization", so kaml expects to find a type tag to be able to feed that back into kotlinx.serialization so it can determine which polymorphic serializer to use.

CONTEXTUAL means "this type uses a contextual serializer that determines how to deserialize the class", so kaml doesn't do any special checking until the serializer tells it what type it really is.

I believe kaml is behaving correctly, but if someone from the kotlinx.serialization team disagrees with that interpretation, I'd be happy to change it.

@Vampire
Copy link
Contributor Author

Vampire commented Aug 24, 2020

I'm new to this, so my interpretation might be wrong, but as far as I understood it, CONTEXTUAL means that you don't know anything about the type until runtime and SEALED means it is a sealed class where all possible subclasses are known at compile time already.

Indications by the kotlinx.serialization team that kaml is misbehaving:

@charleskorn
Copy link
Owner

OK, it seems like there are two issues here.

kaml does not respect even a manually configured serializer using @serializable(with = ...), if I choose a specific serializer manually then it should be used

Do you have a code snippet that shows this behaviour? kaml isn't involved in deciding which serializer to use (it just reports information to kotlinx.serialization which then picks the right one), so that sounds like a bug to me.

This class: https://kotlin.github.io/kotlinx.serialization/kotlinx-serialization-core/kotlinx-serialization-core/kotlinx.serialization.json/-json-content-polymorphic-serializer/index.html linked to from the current docs also shows that it is intended like that.

Interesting, based on looking at the JSON format code I'd expect it to be behaving in a similar way to kaml and throwing an exception. If you use your custom serializer with SEALED and the built-in JSON serializer, does it work?

@Vampire
Copy link
Contributor Author

Vampire commented Aug 24, 2020

OK, it seems like there are two issues here.

kaml does not respect even a manually configured serializer using @serializable(with = ...), if I choose a specific serializer manually then it should be used

Do you have a code snippet that shows this behaviour? kaml isn't involved in deciding which serializer to use (it just reports information to kotlinx.serialization which then picks the right one), so that sounds like a bug to me.

I showed you here: #30 (comment)
Instead of using that serializer, kaml throws an error because it does not find the type tag which is irrelevant though as a custom serializer should be used that doesn't need it.

This class: https://kotlin.github.io/kotlinx.serialization/kotlinx-serialization-core/kotlinx-serialization-core/kotlinx.serialization.json/-json-content-polymorphic-serializer/index.html linked to from the current docs also shows that it is intended like that.

Interesting, based on looking at the JSON format code I'd expect it to be behaving in a similar way to kaml and throwing an exception. If you use your custom serializer with SEALED and the built-in JSON serializer, does it work?

Yes, I just tried it, works fine.
Would be very strange otherwise as they use it like that themselves. :-)

@charleskorn
Copy link
Owner

Digging into this further, I think this is actually one issue...

kotlinx.serialization's JSON decoder does some special handling when polymorphic types are involved, and invokes a completely different code path depending on the serializer type - if the serializer is a AbstractPolymorphicSerializer (or a derived class), it invokes some special handling, otherwise it completely ignores the polymorphism and just invokes the serializer as if there was nothing special going on.

This feels weird - I feel that the encoding shouldn't care what serializer is being used exactly, but I'll open an issue with kotlinx.serialization to see what their opinion is.

@charleskorn
Copy link
Owner

Issue created: Kotlin/kotlinx.serialization#1009

@charleskorn charleskorn added the state:blocked Waiting for something else label Aug 29, 2020
@charleskorn charleskorn added help wanted Extra attention is needed and removed state:blocked Waiting for something else labels Sep 26, 2020
@QuinnBast
Copy link

QuinnBast commented Aug 16, 2021

Hello!! I found this issue and I don't know where else to ask about this but this seems like the most relevant place.

I have a YAML data type that may be defined as two different types but depend on the value of another field in the type. For example:

someList:
    - type: Something
      fieldForSomething: 1
    - type: OtherThing
      fieldForOtherThing: 2

I am attempting to create a custom KSerializer for this but have had no luck so far. Any guidance would be greatly appreciated :)

So far I have this:
object CustomSerializer : KSerializer<List<CustomDefinition>> {
  @OptIn(InternalSerializationApi::class)
  private val typeDescriptor: SerialDescriptor by lazy {
      buildSerialDescriptor("type", PrimitiveKind.STRING)
  }
  @OptIn(InternalSerializationApi::class)
  override val descriptor: SerialDescriptor by lazy {
      buildSerialDescriptor("CustomDefinition", SerialKind.CONTEXTUAL) {
          val oneDescriptor: SerialDescriptor by lazy {
              buildClassSerialDescriptor("TypeOneDescriptor")
          }
          val twoDescriptor: SerialDescriptor by lazy {
              buildClassSerialDescriptor("TypeTwoDescriptor")
          }

          element("oneDefinition", oneDescriptor)
          element("twoDefinition", twoDescriptor)
      }
  }

  override fun deserialize(decoder: Decoder): List<CustomDefinition> {
      if (decoder !is YamlInput) {
          throw UnsupportedOperationException("Can only deserialize from YAML source.")
      }

      val input = decoder.beginStructure(descriptor) as YamlInput

      if(input.node is YamlScalar && input.node.path.endLocation.toString() == "customDefinitionList") {
          // We want to read the "type" key and parse into the correct type.
          decoder.beginStructure(descriptor) as YamlInput
          
          // No idea on the index here?
          when(Type.valueOf(decoder.decodeStringElement(typeDescriptor, 1))) {
              Type.Something -> {
                  decoder.beginStructure(descriptor)
              }
              Type.OtherThing -> {
                   decoder.beginStructure(descriptor)
              }
          }
      }

      input.endStructure(descriptor)
  }
}

I honestly have no idea what I am doing :)

@charleskorn
Copy link
Owner

@QuinnBast, if the type field determines what Kotlin type to use, you don't need to use a custom serializer, you can use built-in features of kotlinx.serialization and kaml.

https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/polymorphism.md describes the polymorphism support available in kotlinx.serialization, and uses the built-in JSON format for its examples.

For kaml, you can configure the YAML parser to use a property for polymorphism by setting polymorphismStyle to PolymorphismStyle.Property, and setting polymorphismPropertyName to the name of the field you'd like to use to store type information. In your case, this is type, which is the default value of polymorphismPropertyName.

@chachako
Copy link

chachako commented Sep 30, 2021

@charleskorn I have the same problem. kaml does not use @Serializable(with = CustomSerializer::class) annotations on properties, but JSON doesn't have this problem. I also tried debug and found that it would not even run the deserialize method of CustomSerializer.

For JSON:

{
  "name": "",
  "age": 1
}
// Successfully printed ->> 
//     JsonData(name=[], age=1)
prinln(Json.decodeFromString<Data>(json))

For kaml:

name: ""
age: 1
// Errors printed ->> 
//     Value for 'name' is invalid: Expected a list, but got a scalar value
prinln(Yaml.default.decodeFromString<Data>(json))
@Serializable
data class Data(
  @Serializable(with = StringOrListSerializer::class)
  val name: List<String>,
  val age: Int
)

object StringOrListSerializer : KSerializer<List<String>> {
  private val listSerializer: KSerializer<List<String>> = ListSerializer(String.serializer())

  override val descriptor: SerialDescriptor = listSerialDescriptor<String>()

  override fun serialize(encoder: Encoder, value: List<String>) = listSerializer.serialize(encoder, value)

  override fun deserialize(decoder: Decoder): List<String> = try {
    listOf(decoder.decodeString())
  } catch (e: Throwable) {
    decoder.decodeSerializableValue(listSerializer)
  }
}

@charleskorn
Copy link
Owner

@rinorz, I suspect the issue is that the descriptor in StringOrObjectSerializer says that the serializer only supports lists. Could you try modifying descriptor to report that the serializer supports either strings or lists?

Something like this should work:

val descriptor: SerialDescriptor by lazy {
    buildSerialDescriptor(serialName, SerialKind.CONTEXTUAL) {
        element("list", listSerialDescriptor<String>())
        element("string", PrimitiveSerialDescriptor("value", PrimitiveKind.STRING))
    }
}

If not, could you please create a sample project that reproduces the issue so I can investigate further?

@chachako
Copy link

chachako commented Oct 2, 2021

@charleskorn Thank you. It successfully called deserialize, but sent an error:

java.lang.IllegalStateException: Must call beginStructure() and use returned Decoder
	at com.charleskorn.kaml.YamlContextualInput.decodeValue(YamlInput.kt:235)
	at kotlinx.serialization.encoding.AbstractDecoder.decodeString(AbstractDecoder.kt:34)
object StringOrListSerializer : KSerializer<List<String>> {
  private val listSerializer: KSerializer<List<String>> = ListSerializer(String.serializer())

  override val descriptor: SerialDescriptor = buildSerialDescriptor(
    StringOrListSerializer::class.java.name,
    SerialKind.CONTEXTUAL
  ) {
    element("list", listSerialDescriptor<String>())
    element<String>("string")
  }

  override fun serialize(encoder: Encoder, value: List<String>) = listSerializer.serialize(encoder, value)

  override fun deserialize(d: Decoder): List<String> {
    val decoder = d.beginStructure(descriptor) as Decoder
    return try {
      listOf(decoder.decodeString())
    } catch (e: Throwable) {
      decoder.decodeSerializableValue(listSerializer)
    }
  }
}

@charleskorn
Copy link
Owner

It's very difficult to diagnose the problem without a full sample project and a complete stack trace - without these, I can't see where the problem is coming from. Could you please create a sample project that demonstrates the issue and share a link to it?

@chachako
Copy link

chachako commented Oct 3, 2021

@charleskorn In fact, the above code is a complete example, but for convenience, I created a minimal project to reproduce the problem: https://github.com/RinOrz/kaml-bug30/blob/master/src/test/kotlin/Tests.kt

@charleskorn
Copy link
Owner

The issue is that you need to call beginStructure twice: once to start decoding the contextual structure, and then again to decode the value itself. kaml is quite strict about this, while it seems that the built-in JSON format is far more lenient.

For example, the following appears to work in the sample project you provided:

 @OptIn(InternalSerializationApi::class, ExperimentalSerializationApi::class)
 object StringOrListSerializer : KSerializer<List<String>> {
   private val listSerializer: KSerializer<List<String>> = ListSerializer(String.serializer())
+  private val listDescriptor = listSerializer.descriptor
+  private val stringDescriptor = String.serializer().descriptor

   override val descriptor: SerialDescriptor = buildSerialDescriptor(
     StringOrListSerializer::class.java.name,
     SerialKind.CONTEXTUAL
   ) {
-    element("list", listSerialDescriptor<String>())
+    element("list", listDescriptor)
     element<String>("string")
   }
 
   override fun serialize(encoder: Encoder, value: List<String>) = listSerializer.serialize(encoder, value)

   override fun deserialize(decoder: Decoder): List<String> = decoder.fromJson() ?: decoder.fromYaml()

   private fun Decoder.fromJson(): List<String>? {
     val decoder = this as? JsonDecoder ?: return null
     return try {
       listOf(decoder.decodeString())
     } catch (e: Throwable) {
       decoder.decodeSerializableValue(listSerializer)
     }
   }

   private fun Decoder.fromYaml(): List<String> {
-    val decoder = (this as YamlInput).beginStructure(descriptor) as Decoder
-    return try {
-      listOf(decoder.decodeString())
-    } catch (e: IncorrectTypeException) {
-      decoder.decodeSerializableValue(listSerializer)
+    val decoder = this.beginStructure(descriptor) as YamlInput
+
+    return when (decoder.node) {
+      is YamlScalar -> decodeString(decoder)
+      is YamlList -> decodeList(decoder)
+      else -> throw IllegalArgumentException("Value is neither a list nor a string.")
     }
   }
-}
+
+  private fun decodeString(input: YamlInput): List<String> {
+    val decoder = input.beginStructure(stringDescriptor) as YamlInput
+
+    return listOf(decoder.decodeString()).also { decoder.endStructure(stringDescriptor) }
+  }
+
+  private fun decodeList(input: YamlInput): List<String> {
+    return input.decodeSerializableValue(listSerializer)
+  }
+}

@QuinnBast
Copy link

Just a comment on this since I've seen it has some activity recently. Thank you for the information about the type as well as being able to configure the Polymorphism Name. Those solved my issue. This can be closed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants