-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a faster way to go from json bytes to formatted output? #1199
Comments
@mjsmith707 Do you mean pretty-printing on flight without parsing and validation of parsed values? Could you please add tests with some possible inputs and expected outputs? |
@plokhotnyuk Thanks for the quick reply. I don't have any non-work related sample data but any old class data will suffice I suppose. The LazyJson class has a pair of constructors, one for the actual class and another for just the json string byte array representation of it (which is assumed to be valid). When the codec's Here's a (hopefully) more fully fleshed out example. Note that the CharArrayJsoniterWriter is in Java, the rest is Scala: Basically the example output is the same as if you were to just use the regular codec, a formatted JSON string. The difference here was I didn't need to deserialize it to |
The provided code snippet can behave differently depending on how it was called an what input was provided. You can try to run your benchmarks with async-profiler and build flame graphs for CPU cycles and allocations to see what is happening under the hood. A better option would be converting your benchmarks to run under sbt-jmh plugin. In the README page of this project you can find a lot of command samples to run JMH benchmarks using different profilers. My bet that in your case jsoniter-scala spend much less time on allocations during serialization from case classes. |
Right this is more of a proof of concept/experiment than anything and not very optimized. From some simple testing (i.e. not using jmh) I found the jsoniter generated codec to be roughly 2x faster at serialization than my experiment. I guess for the purposes of this issue, it is more of a feature request for a way to write a json byte array (or raw json string even) as a properly formatted json string quickly as part of the JsonWriter API |
I have an interesting usecase where I'm deserializing a JSONB column from Postgres into a class object. In most cases however I'm just sending these as a CRUD API output over http. So it would be faster if I could skip the deserialization step and just go straight to writing the bytes instead. As the bytes are already known to be valid json (via postgres), then reparsing them isn't really needed, rather, just having the output formatted correctly is the important part. I've also experimented with just deserializing to Circe's JSON and reserializing that and it was slower than just using the class's codec.
I wrote a lazy codec (contains either type T or an Array[Byte]) and then in that I wrote a utility to scan across the bytes and then call JsonWriter's methods like out.writeVal or out.writeObjectStart() etc. That all works but it's pretty slow versus the serialization generated by jsoniter (7s vs 14s on my machine in 10m iterations). It is however, still faster than deserializing to the class then reserializing again (23s vs 14s). The CharArrayJsoniterWriter was kind of hacked up so there's probably bugs/room for improvements.
Any thoughts on this? It would be great if JsonWriter had an out.writeFormattedBytes method 😅
Here's my char writer and LazyJson class https://gist.github.com/mjsmith707/9bdf76091da4bd324308b70e9638e5a8
The text was updated successfully, but these errors were encountered: