Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a faster way to go from json bytes to formatted output? #1199

Open
mjsmith707 opened this issue Sep 23, 2024 · 4 comments
Open

Is there a faster way to go from json bytes to formatted output? #1199

mjsmith707 opened this issue Sep 23, 2024 · 4 comments
Labels

Comments

@mjsmith707
Copy link

mjsmith707 commented Sep 23, 2024

I have an interesting usecase where I'm deserializing a JSONB column from Postgres into a class object. In most cases however I'm just sending these as a CRUD API output over http. So it would be faster if I could skip the deserialization step and just go straight to writing the bytes instead. As the bytes are already known to be valid json (via postgres), then reparsing them isn't really needed, rather, just having the output formatted correctly is the important part. I've also experimented with just deserializing to Circe's JSON and reserializing that and it was slower than just using the class's codec.

I wrote a lazy codec (contains either type T or an Array[Byte]) and then in that I wrote a utility to scan across the bytes and then call JsonWriter's methods like out.writeVal or out.writeObjectStart() etc. That all works but it's pretty slow versus the serialization generated by jsoniter (7s vs 14s on my machine in 10m iterations). It is however, still faster than deserializing to the class then reserializing again (23s vs 14s). The CharArrayJsoniterWriter was kind of hacked up so there's probably bugs/room for improvements.

Any thoughts on this? It would be great if JsonWriter had an out.writeFormattedBytes method 😅

Here's my char writer and LazyJson class https://gist.github.com/mjsmith707/9bdf76091da4bd324308b70e9638e5a8

@plokhotnyuk
Copy link
Owner

plokhotnyuk commented Sep 24, 2024

@mjsmith707 Do you mean pretty-printing on flight without parsing and validation of parsed values?

Could you please add tests with some possible inputs and expected outputs?

@mjsmith707
Copy link
Author

@plokhotnyuk Thanks for the quick reply. I don't have any non-work related sample data but any old class data will suffice I suppose. The LazyJson class has a pair of constructors, one for the actual class and another for just the json string byte array representation of it (which is assumed to be valid). When the codec's encodeValue is invoked, it calls that CharArrayJsoniterWriter which loops through the bytes then calls the various methods on JsonWriter like writeObjectStart() etc. to format it properly.

Here's a (hopefully) more fully fleshed out example. Note that the CharArrayJsoniterWriter is in Java, the rest is Scala:
https://gist.github.com/mjsmith707/81908f7523b380a00697f0dd81b75ca8

Basically the example output is the same as if you were to just use the regular codec, a formatted JSON string. The difference here was I didn't need to deserialize it to MyTestClass first. Instead I just carried in the byte array (which in my case would come from Postgres as a JSONB column).

@plokhotnyuk
Copy link
Owner

The provided code snippet can behave differently depending on how it was called an what input was provided.

You can try to run your benchmarks with async-profiler and build flame graphs for CPU cycles and allocations to see what is happening under the hood.

A better option would be converting your benchmarks to run under sbt-jmh plugin. In the README page of this project you can find a lot of command samples to run JMH benchmarks using different profilers.

My bet that in your case jsoniter-scala spend much less time on allocations during serialization from case classes.

@mjsmith707
Copy link
Author

mjsmith707 commented Sep 25, 2024

Right this is more of a proof of concept/experiment than anything and not very optimized. From some simple testing (i.e. not using jmh) I found the jsoniter generated codec to be roughly 2x faster at serialization than my experiment. I guess for the purposes of this issue, it is more of a feature request for a way to write a json byte array (or raw json string even) as a properly formatted json string quickly as part of the JsonWriter API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants