Replace Klaxon with kotlinx-serialization #603

devcrocod · 2024-02-23T18:55:35Z

This is the first step in migrating from klaxon to kotlinx-serialization #312

Removed klaxon dependency
Added kotlinx-serialization
Klaxon objects replaced with JsonElement from serialization
All other behavior is preserved. Unlike klaxon, kotlinx-serialization works better with primitives. For example, klaxon does not work with float, only with double, so I had to specially handle the float case. Also, nulls are wrapped in serialization.

This PR should be accepted after #574, after which I will resolve the conflicts.

build.gradle.kts

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/json.kt

Jolanrensen · 2024-02-26T11:07:43Z

What do you mean with "specially handle the float case". Looking at your code I see you skip it and convert it to a Double sometimes. I don't really understand why; DataFrame understands Floats just fine, so you can keep them as Floats

zaleslaw · 2024-02-26T12:42:40Z

@devcrocod you wrote that this is a first step, what could be the next step?

devcrocod · 2024-02-26T13:01:24Z

@devcrocod you wrote that this is a first step, what could be the next step?

Check related issue

devcrocod · 2024-02-27T12:38:53Z

What do you mean with "specially handle the float case". Looking at your code I see you skip it and convert it to a Double sometimes. I don't really understand why; DataFrame understands Floats just fine, so you can keep them as Floats

By default, klaxon translates all floating-point numbers into Double. kotlinx-serialization can work with float. So, the question here is not what DataFrame can do, but what klaxon could do. It was possible to support Float in JSON parsing, but then it would have been necessary to additionally change the logic in JSON parsing. I decided not to do this because these subsequent parsers should ultimately change, and because it maximally preserves the previous behavior of the library.

Jolanrensen · 2024-02-27T12:51:52Z

What do you mean with "specially handle the float case". Looking at your code I see you skip it and convert it to a Double sometimes. I don't really understand why; DataFrame understands Floats just fine, so you can keep them as Floats

By default, klaxon translates all floating-point numbers into Double. kotlinx-serialization can work with float. So, the question here is not what DataFrame can do, but what klaxon could do. It was possible to support Float in JSON parsing, but then it would have been necessary to additionally change the logic in JSON parsing. I decided not to do this because these subsequent parsers should ultimately change, and because it maximally preserves the previous behavior of the library.

I think the previous behavior of the library was not intended, but rather a limit imposed on the library by Klaxon. It seems odd to keep this broken behavior if both Kotlinx-serialization ánd DataFrame support Floats to convert them to Doubles. So yes, this might require updating some tests and logic, but it's ultimately for the best I believe.

devcrocod · 2024-02-27T13:05:10Z

I agree that such behavior is not entirely correct, as it also carries computational errors when converting float to double. But I will repeat that this behavior will be changed in the future when serializers for dataframe objects are written.

Moreover, in the current version, there is a question:
During serialization, it's obvious how it should look:

1.1f -> 1.1 (float)
1.1 -> 1.100000 (double)

But during deserialization, we look at each element, and the logic is as follows: if the number fits within 32, then it's a float; if it's more than 32, then it's a double. And it turns out that our list can easily contain both Float and Double, and instead of the output type being Double, we get Number.

# Conflicts: # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/json.kt # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/jupyter/JupyterHtmlRenderer.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/json.kt # core/src/test/kotlin/org/jetbrains/kotlinx/dataframe/jupyter/RenderingTests.kt

Jolanrensen · 2024-06-10T11:30:51Z

@devcrocod Looks like you introduced 4 failing tests in org.jetbrains.dataframe.ksp.DataFrameSymbolProcessorTest

Jolanrensen

Overall a trivial conversion, thanks :) just some small notes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/readJson.kt

Jolanrensen · 2024-06-10T11:36:02Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/readJson.kt

-                    "NaN" -> {
-                        nanIndices.add(i)
-                        collector.add(null)
+                    is JsonPrimitive -> {


While this xOrNull conversion works, it's quite heavy as it works by throwing/catching exceptions. Ilya actually recently replaced similar logic from the notebook kernel because it was so heavy. It was replaced by Jackson. Now I know that's not an option here, but maybe we could try another approach if that's possible.

I might be missing something
Could you please explain in more detail or provide an example from the notebook?

Sure, let's say we want to parse an array of nulls. In this case, for each element it would check intOrNull, longOrNull, doubleOrNull, and floatOrNull before trying v.jsonPrimitive is JsonNull. All of these work with mapExceptionsToNull {}, meaning, for each element in the array an exception is thrown and caught 4 times. This is very heavy, especially when it happens very often.

Hm, indeed this isn't the good place that should be fixed when supporting Serializable for DataFrame and DataRow. I also think that part of this might be optimized at runtime since our value is already wrapped in JsonPrimitive. There are doubts that jdk8 handles such optimizations well

I moved the null check higher up. Given the current capabilities of kotlinx-serialization, there are two options: either leave it as it is now, or write a custom serializer. However, the logic in the custom serializer would be quite similar. We can always take a String and then manually attempt to convert it to the desired type. This might be slightly better in terms of performance, and we would need to account for all edge cases for different types

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/readJson.kt

Jolanrensen · 2024-06-10T11:37:33Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/readJson.kt

-                                    nanIndices.add(i)
-                                    collector.add(null)
+                                is JsonArray -> collector.add(null)
+                                is JsonPrimitive -> {


Same note as before with xOrNull checks

Jolanrensen · 2024-06-10T11:38:16Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/writeJson.kt

-// serialization versions and format.
-internal const val SERIALIZATION_VERSION = "2.1.0"
+private val valueTypes =
+    setOf(Boolean::class, Double::class, Int::class, Float::class, Long::class, Short::class, Byte::class)


Where's Char?

I just copied what was there and moved it to the top. I believe that all constants and configurations should be at the top, so you don't have to search for them throughout the file. Therefore, all the types that were there are now at the top

ah, but would Char make sense here? or would that just become a String from json?

I think second options

By default, everything is read as string, kotlinx-serialization doesn't have the char option for JsonPrimitive

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/writeJson.kt

Jolanrensen · 2024-06-10T11:43:42Z

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/io/writeJson.kt

-                    if (arraysAreFrames) encodeFrame(it as AnyFrame) else null
-                }
-            ?: encodeRow(frame, rowIndex)
+        valueColumn?.get(rowIndex) ?: arrayColumn?.get(rowIndex)?.let {


This ?: chain is very hard to understand. I know you just reformatted it, but maybe you know a better way to write it? :)

Initially, I also found it difficult to understand what was happening here, so I just rewrote it using objects. I'll see how this can be improved

I rewrote it using the when construct. it seems more readable now. Could you please take a look?

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/json.kt

devcrocod · 2024-06-13T13:08:03Z

I made some minor corrections and a bit of refactoring. I also fixed issues with the plugin tests. The problem was that we set a rateLimit (10000) on the incoming input stream. When reading, we check if the selected format is appropriate; if not, we reset the input stream. The issue was that the JSON format was being predicted among the first, and kotlinx-serialization reads all bytes from the incoming stream, ignoring our set limit. Consequently, when we perform a reset, it always results in an error because more than the limit was read.

# Conflicts: # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/guess.kt

devcrocod · 2024-06-13T14:58:25Z

This is strange, I got errors with DataFrameBlackBoxCodegenTestGenerated after today's merge

Jolanrensen · 2024-06-13T15:19:21Z

This is strange, I got errors with DataFrameBlackBoxCodegenTestGenerated after today's merge

It's because #729 got merged. Looks like the test failed because your API change changed the FIR representation in the plugin. This will probably happen a lot in the future but is actually solvable and just a warning really :)

Simply:

delete the txt files in this folder: https://github.com/Kotlin/dataframe/tree/master/plugins/kotlin-dataframe/testData/box
run the tests in https://github.com/Kotlin/dataframe/blob/master/plugins/kotlin-dataframe/tests-gen/org/jetbrains/kotlin/fir/dataframe/DataFrameBlackBoxCodegenTestGenerated.java two times!
All tests should be green, add all newly generated txt files in the folder previously mentioned to git and commit :)
Done

devcrocod · 2024-06-14T11:09:53Z

I did as you said, and locally everything worked for me. But as you can see, the tests on TC still failed:

org.opentest4j.AssertionFailedError: Actual data differs from file content: toDataFrame_dsl.fir.ir.txt

Could these files be platform or environment dependent? In any case, this approach of regenerating files for tests doesn't look as perfect decision

Jolanrensen · 2024-06-14T12:22:28Z

I did as you said, and locally everything worked for me. But as you can see, the tests on TC still failed:
org.opentest4j.AssertionFailedError: Actual data differs from file content: toDataFrame_dsl.fir.ir.txt
Could these files be platform or environment dependent? In any case, this approach of regenerating files for tests doesn't look as perfect decision

Try setting your project sdk + gradle's jvm to JDK 11, while we figure out how to make that the default for all contributors :)

Edit: actually, I fixed it in #736, I'll update your branch from master and we should be good to go :)

Replace Klaxon with kotlinx-serialization in JSON operations

63d6f8b

Jolanrensen added the enhancement New feature or request label Feb 26, 2024

Jolanrensen added this to the 0.14.0 milestone Feb 26, 2024

Jolanrensen reviewed Feb 26, 2024

View reviewed changes

build.gradle.kts Outdated Show resolved Hide resolved

Jolanrensen reviewed Feb 26, 2024

View reviewed changes

core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/json.kt Outdated Show resolved Hide resolved

zaleslaw self-requested a review February 26, 2024 12:38

devcrocod added 4 commits March 4, 2024 13:12

Remove star imports and fix testing by reading json file

83ba578

Add float support in JSON de/serialization

78441dd

Resolve conflicts after #573 pr

d061d13

devcrocod requested a review from Jolanrensen June 7, 2024 13:53

Jolanrensen requested changes Jun 10, 2024

View reviewed changes

devcrocod added 4 commits June 12, 2024 12:36

Merge branch 'master' into devcrocod/serialization-patch0

6d954f8

Update serialization library version and improve comments

22bb293

Refactor stream handling in guess read

79c294e

Refactor json element matching and little refactor encodeFrame

cf43401

Merge branch 'refs/heads/master' into devcrocod/serialization-patch0

59c33b1

# Conflicts: # core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/guess.kt

devcrocod requested a review from Jolanrensen June 13, 2024 13:18

Jolanrensen approved these changes Jun 13, 2024

View reviewed changes

Generate new test files in a plugin

eda1884

Add generated sources

f3770cf

Jolanrensen mentioned this pull request Jun 14, 2024

Enforce the project to be built with Java 11 #736

Merged

Jolanrensen added 2 commits June 14, 2024 18:08

Merge branch 'master' into devcrocod/serialization-patch0

b385a62

Fixed compiler plugin tests for java 11

c0ba411

Jolanrensen merged commit 76091c7 into master Jun 14, 2024
3 of 4 checks passed

devcrocod deleted the devcrocod/serialization-patch0 branch June 18, 2024 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Klaxon with kotlinx-serialization #603

Replace Klaxon with kotlinx-serialization #603

devcrocod commented Feb 23, 2024

Jolanrensen commented Feb 26, 2024

zaleslaw commented Feb 26, 2024

devcrocod commented Feb 26, 2024

devcrocod commented Feb 27, 2024

Jolanrensen commented Feb 27, 2024 •

edited

Loading

devcrocod commented Feb 27, 2024

Jolanrensen commented Jun 10, 2024

Jolanrensen left a comment

Jolanrensen Jun 10, 2024

devcrocod Jun 10, 2024

Jolanrensen Jun 10, 2024 •

edited

Loading

devcrocod Jun 13, 2024

devcrocod Jun 13, 2024

Jolanrensen Jun 10, 2024

Jolanrensen Jun 10, 2024

devcrocod Jun 10, 2024

Jolanrensen Jun 10, 2024

devcrocod Jun 12, 2024

devcrocod Jun 13, 2024

Jolanrensen Jun 10, 2024

devcrocod Jun 10, 2024

devcrocod Jun 13, 2024

devcrocod commented Jun 13, 2024

devcrocod commented Jun 13, 2024

Jolanrensen commented Jun 13, 2024 •

edited

Loading

devcrocod commented Jun 14, 2024

Jolanrensen commented Jun 14, 2024 •

edited

Loading

Replace Klaxon with kotlinx-serialization #603

Replace Klaxon with kotlinx-serialization #603

Conversation

devcrocod commented Feb 23, 2024

Jolanrensen commented Feb 26, 2024

zaleslaw commented Feb 26, 2024

devcrocod commented Feb 26, 2024

devcrocod commented Feb 27, 2024

Jolanrensen commented Feb 27, 2024 • edited Loading

devcrocod commented Feb 27, 2024

Jolanrensen commented Jun 10, 2024

Jolanrensen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jolanrensen Jun 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devcrocod commented Jun 13, 2024

devcrocod commented Jun 13, 2024

Jolanrensen commented Jun 13, 2024 • edited Loading

devcrocod commented Jun 14, 2024

Jolanrensen commented Jun 14, 2024 • edited Loading

Jolanrensen commented Feb 27, 2024 •

edited

Loading

Jolanrensen Jun 10, 2024 •

edited

Loading

Jolanrensen commented Jun 13, 2024 •

edited

Loading

Jolanrensen commented Jun 14, 2024 •

edited

Loading