Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate faster reflection-free Jackson serializers #41063

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

mariofusco
Copy link
Contributor

This is a PoC demonstrating a possible improvement in Jackson serialization performances that out-of-the-box is heavily based on reflection. The Idea is replacing this behavior with automatically generated serializers and configure them on the Jackson's ObjectMapper.

To demonstrate how this works I added the following trivial rest endpoint to the benchmark suites that @franz1981 and I used for our workshop on profiling.

@Path("/customer")
public class CustomerLookupResource {

    @GET
    @Produces(MediaType.APPLICATION_JSON)
    @NonBlocking
    public Customer hello() {
        Customer customer = new Customer();
        customer.setFirstName("Mario");
        customer.setLastName("Fusco");
        customer.setAge(50);
        customer.setIncome(1000.0);
        return customer;
    }
}

Running this benchmark on my machine I got the following result

Profiling for 20 seconds
Done
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   100.08μs   85.29μs  20.71ms   94.35%
    Req/Sec   87228.29  17019.32  132217.00     92.68
  3576360 requests in 40.002s, 658.26MB read
Requests/sec: 89404.53
Transfer/sec:  16.46MB

and in particular the execution related with the serialization of the Customer object looks like the following, where the Jackson's BeanPropertyWriters retrieve the value of each and every field to be serialized via reflection.

image

With the change that I'm proposing I automatically find via Jandex the list of the classes requiring a Jackson serialization and for each of them I generate with Gizmo a custom serializer like this:

package profiling.workshop.json;

import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.databind.SerializerProvider;
import com.fasterxml.jackson.databind.ser.std.StdSerializer;
import java.io.IOException;

// $VF: synthetic class
public class Customer$quarkusjacksonserializer extends StdSerializer {
   public Customer$quarkusjacksonserializer() {
      super(Customer.class);
   }

   public void serialize(Object var1, JsonGenerator var2, SerializerProvider var3) throws IOException {
      Customer var4 = (Customer)var1;
      var2.writeStartObject();
      int var5 = var4.getAge();
      var2.writeNumberField("age", var5);
      String var6 = var4.getFirstName();
      var2.writeStringField("firstName", var6);
      double var7 = var4.getIncome();
      var2.writeNumberField("income", var7);
      String var9 = var4.getLastName();
      var2.writeStringField("lastName", var9);
      var2.writeEndObject();
   }
}

and configure the ObjectMapper to use it. Rerunning the former benchmark with this improvement I got this result:

Profiling for 20 seconds
Done
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    87.81μs   66.24μs  13.17ms   93.30%
    Req/Sec   101432.88  14548.62  136171.00     70.00
  4057315 requests in 40.001s, 746.79MB read
Requests/sec: 101430.34
Transfer/sec:  18.67MB

with an increase greater than 10% in the number of requests served per second even for such a trivial use case. In this case the serialization of the Customer class is entirely executed by the generated serializer with the advantage of avoiding any use of reflection, thus requiring less than the half of the number of samples than before.

image

I'm planning to keep working on this PoC in the next days, mostly to improve the generated serializer, making it cover more complex use cases and be consistent with Jackson's semantic, but any early feedback on this is welcome.

/cc @franz1981 @geoand

@mariofusco mariofusco marked this pull request as draft June 7, 2024 14:32
@quarkus-bot quarkus-bot bot added the area/rest label Jun 7, 2024
Copy link

quarkus-bot bot commented Jun 7, 2024

Thanks for your pull request!

The title of your pull request does not follow our editorial rules. Could you have a look?

  • title should preferably start with an uppercase character (if it makes sense!)

This message is automatically generated by a bot.

@geoand
Copy link
Contributor

geoand commented Jun 7, 2024

This is very neat!

I had done something similar a few years ago and the problem I had encountered was figuring out the proper conditions for falling back to Jackson's default behavior.

@geoand
Copy link
Contributor

geoand commented Jun 7, 2024

I also wonder if it makes sense to collaborate with upstream as there may be a way to generalize Afterburner to use this approach

@franz1981
Copy link
Contributor

franz1981 commented Jun 7, 2024

This is indeed a very nice experiment! Just thinking loud but...what if you let Jackson to create it's own reflection based classes and use such to create your one?

@mariofusco
Copy link
Contributor Author

I also wonder if it makes sense to collaborate with upstream as there may be a way to generalize Afterburner to use this approach

I'm giving a look at Afterburner (I didn't know it) and it seems that now it is considered somewhat legacy with Blackbird that is currently under development as a more modern alternative.

That said, Blackbird is based on the LambdaMetafactory, while I still think that we can do better with our Gizmo bytecode generation. However it is interesting to check if we could at least reuse part of it to avoid reimplementing part of the Jackson logic and annotation semantic on our side as I think that @franz1981 was also suggesting. I will further investigate this.

@geoand
Copy link
Contributor

geoand commented Jun 10, 2024

That said, Blackbird is based on the LambdaMetafactory, while I still think that we can do better with our Gizmo bytecode generation

Definitely!

However it is interesting to check if we could at least reuse part of it to avoid reimplementing part of the Jackson logic and annotation semantic on our side as I think that @franz1981 was also suggesting. I will further investigate this

Thanks!

@mariofusco
Copy link
Contributor Author

However it is interesting to check if we could at least reuse part of it to avoid reimplementing part of the Jackson logic and annotation semantic on our side as I think that @franz1981 was also suggesting. I will further investigate this

A quick update on this: I was hoping that Blackbird had some discovery mechanism that finds and introspects upfront the POJO classes to be serialized, and that I could reuse exactly the same mechanism in our extension in order to discover the classes for which we need to generate the serializers via Gizmo without having to also reimplement Jackson's semantics on our side.

Unfortunately this is not case. On this regard Blackbird is much more simple than I thought and in practice it's only a Jackson's add-on that works lazily at the level of the single field to be serialized. In particular what happens is the following:

  1. When a pojo is serialized for the first time Jackson lazily introspect the class and discovers the fields to be serialized.
  2. Let's say it finds that it has to serialize the firstName of a Person, it checks in a cache if it already has a BeanPropertyWriter to access the value of that specific field from a Person instance and write it into the output stream and if it cannot find any it creates it.
  3. At this point the Blackbird add-on eventually kicks in: if its module has been registered on the ObjectMapper in use, instead of reading the firstName of the Person via reflection it creates a lambda that does the same and register it on that cache to be reused for subsequent access.

In essence everything is done lazily at runtime and there isn't any discovery mechanism to be reused. In my opinion this not only goes against the Quarkus philosophy but it is also quite sub-optimal compared with the solution that I sketched here, and some preliminary benchmark results already confirm this impression.

If we want to continue implementing this solution I don't see any alternative other than also duplicating the Jackson logic and annotation semantics on our side and of course I see the potential risks and maintainance burden in this approach. Nevertheless I would like to keep experimenting with this and check how far I can push it, unless you don't see already a showstopper. Please @geoand and @franz1981 let me know your opinion and how you suggest to eventually proceed on this.

@franz1981
Copy link
Contributor

What about what I have suggested in the previous comment? i.e. checking the serializers used by Jackson which contains Field and Method reference and use them to create the custom serializers as you are doing already, just based on Jackson-way to inspect the classes to serialize. This will likely require to modify Jackson to make these cached serializers accessible

@geoand
Copy link
Contributor

geoand commented Jun 10, 2024

In essence everything is done lazily at runtime and there isn't any discovery mechanism to be reused. In my opinion this not only goes against the Quarkus philosophy but it is also quite sub-optimal compared with the solution that I sketched here, and some preliminary benchmark results already confirm this impression.

I completely agree.

Nevertheless I would like to keep experimenting with this and check how far I can push it, unless you don't see already a showstopper

+1.

Please @geoand and @franz1981 let me know your opinion and how you suggest to eventually proceed on this.

When you have a understanding of how things work in Jackson around this, I propose reaching out to Tatu (Jackson's maintainer) and chatting with him to see if there is something that can be done (without requiring a massive rework of Jackson) so we can better integrate at build time. From a Jackson point of view, it could be noted that Quarkus would not be the only beneficiary of such an approach, other tools and frameworks could utilize something similar.

private String writeMethodName(String typeName) {
return switch (typeName) {
case "java.lang.String" -> "writeStringField";
case "short", "java.lang.Short", "int", "java.lang.Integer", "long", "java.lang.long", "float",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.lang.Long uppercase L

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mariofusco mariofusco force-pushed the gen_json_ser branch 2 times, most recently from d5e2b99 to 1275e7b Compare July 26, 2024 16:27
@mariofusco mariofusco changed the title [POC] Generate jackson serializers Generate jackson serializers Jul 30, 2024
@mariofusco mariofusco marked this pull request as ready for review July 30, 2024 14:20
@mariofusco
Copy link
Contributor Author

I made some good progresses with this PoC to the point that I believe that this is at least ready to be reviewed now, if not to be considered to be merged. As already discussed in a former comment and also here, I didn't find any reasonable way to reuse, even partially, the java objects introspection made by Jackson, mostly because it all happens at runtime on the actual object instances while we want do all the code generation at build time relying only on what we can infer (via jandex) from the classes to be serialized. For this reason I implemented a mechanism to skip the generation of the StdSerializer for classes containing Jackson annotations that we cannot process at build time, and then for those classes our serializers falls back to the original reflection-based Jackson serialization implementation.

The fact that we no longer need reflection to perform json serialization (at least when we don't have to use that fallback mechanism) also implies a good performance improvement. To measure this I added a new rest endpoint to the our benchmark suites that we used for our workshop on profiling that simply performs a json serialization of an instance of a Customer class. In essence the rest endpoint produces a json like the following:

{
  "address": {
    "street": "viale Michelangelo",
    "town": "Mondragone"
  },
  "children": [
    {
      "age": 12,
      "firstName": "Sofia",
      "lastName": "Fusco"
    },
    {
      "age": 9,
      "firstName": "Marilena",
      "lastName": "Fusco"
    }
  ],
  "creditCards": [
    {
      "limit": 100,
      "name": "Visa"
    },
    {
      "limit": 150,
      "name": "Amex"
    }
  ],
  "age": 50,
  "firstName": "Mario",
  "lastName": "Fusco"
}

What this pull request does is generating at build time and registering a compile time a serializer for that Customer class like the following:

public class Customer$quarkusjacksonserializer extends StdSerializer {
   public Customer$quarkusjacksonserializer() {
      super(Customer.class);
   }

   public void serialize(Object var1, JsonGenerator var2, SerializerProvider var3) throws IOException {
      Customer var4 = (Customer)var1;
      var2.writeStartObject();
      Address var5 = var4.getAddress();
      var2.writePOJOField("address", var5);
      List var6 = var4.getChildren();
      var2.writePOJOField("children", var6);
      CreditCard[] var7 = var4.getCreditCards();
      var2.writePOJOField("creditCards", var7);
      double var9 = var4.getIncome();
      String[] var8 = new String[]{"admin"};
      if (JacksonMapperUtil.includeSecureField(var8)) {
         var2.writeNumberField("income", var9);
      }

      int var11 = var4.getAge();
      var2.writeNumberField("age", var11);
      String var12 = ((Person)var4).getFirstName();
      var2.writeStringField("firstName", var12);
      String var13 = ((Person)var4).getLastName();
      var2.writeStringField("lastName", var13);
      var2.writeEndObject();
   }
}

Running this benchmark against the current Quarkus main branch I obtained the following result

Profiling for 20 seconds
Done
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   146.71μs   88.88μs   7.01ms   90.34%
    Req/Sec   64358.20  11838.56  81035.00     87.80
  2638686 requests in 40.002s,   1.06GB read
Requests/sec: 65963.85
Transfer/sec:  27.11MB

while doing the same against this pull request I got

Profiling for 20 seconds
Done
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    99.53μs   68.13μs   6.78ms   92.66%
    Req/Sec   90953.20  15362.45  104565.00     97.56
  3729081 requests in 40.001s,   1.50GB read
Requests/sec: 93224.69
Transfer/sec:  38.32MB

Be aware that this quite huge speed up is also partially due to some minor improvement that I implemented in the FullyFeaturedServerJacksonMessageBodyWriter.

@geoand @franz1981 If possible I'd like to let run this pull request on the Quarkus CI and check if it breaks anything. Also I'd like to further discuss on how to move forward with the adoption of this work. For instance we could (at least temporarily?) consider this feature as an opt-in improvement and allow users to give it a try and enable it on demand trough a Quarkus config property. Any other idea or feedback on this is welcome.

@geoand
Copy link
Contributor

geoand commented Jul 30, 2024

Very very cool results!

Also I'd like to further discuss on how to move forward with the adoption of this work. For instance we could (at least temporarily?) consider this feature as an opt-in improvement and allow users to give it a try and enable it on demand trough a Quarkus config property

I believe this is a good idea

This comment has been minimized.

@geoand
Copy link
Contributor

geoand commented Jul 30, 2024

Do you have a similar measurement for the case where BasicJacksonMessageBodyWriter is used?

@franz1981
Copy link
Contributor

very nice work @mariofusco cannot wait to do a full review for this 👍

return true;
}
if (fieldType instanceof ParameterizedType pType) {
if (pType.arguments().size() == 1 && (typeName.equals("java.util.List") ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding common concrete types can be helpful here (but not key, I think - I won't expect using ArrayList to be the common case TBH)

@mariofusco
Copy link
Contributor Author

Do you have a similar measurement for the case where BasicJacksonMessageBodyWriter is used?

I commented out this annotation so it uses the BasicJacksonMessageBodyWriter and made another run.

On main I got:

Profiling for 20 seconds
Done
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   107.19μs   82.93μs   8.22ms   94.05%
    Req/Sec   82472.29  14301.76  92471.00     97.56
  3381364 requests in 40.002s,   1.41GB read
Requests/sec: 84529.87
Transfer/sec:  36.03MB

while with this pull request:

Profiling for 20 seconds
Done
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   103.60μs   69.02μs  11.34ms   92.73%
    Req/Sec   88104.37  15416.00  104050.00     95.12
  3692279 requests in 40.002s,   1.50GB read
Requests/sec: 92302.46
Transfer/sec:  38.50MB

so as I expected the difference is much smaller (because now I'm not taking advantage of the other optimization that I mentioned before), but still quite relevant.

I will add the Quarkus property that I suggested to allow to opt-in this feature. Any suggestion on the naming? Do we have some other similar opt-in features?

@geoand
Copy link
Contributor

geoand commented Jul 31, 2024

Our naming has been inconsistent to say the least 😂.

I would do something like quarkus.jackson.optimization.enable-build-time-serializers or something like that.

@FroMage
Copy link
Member

FroMage commented Jul 31, 2024

I only learn of this now :(

I've been doing the same in Renarde, I generate Jackson serialisers and deserialisers at build-time, but for JPA entities, which have a slightly more complex use-case to POJOS in that IDs and lazy-loading has to be respected. Because JPA entities are cyclic graphs, and Jackson doesn't deal with graphs.

So, it's a custom serialiser that only serialises IDs for relations (instead of serialising recursively).

The code is all there: https://github.com/quarkiverse/quarkus-renarde/blob/main/transporter/deployment/src/main/java/io/quarkiverse/renarde/transporter/deployment/RenardeTransporterProcessor.java

We should definitely discuss joining code. It's great to generate recursive serialisers for POJOs, but it can't work for JPA objects, and I feel it's just a toggle between the two modes. JPA entities need special id-based non-recursive serialisers, while every other POJO can use the recursive one, but generating them is mostly the same code.

WDYT?

@FroMage
Copy link
Member

FroMage commented Jul 31, 2024

Since we're talking about profiling JSON serialisers, did you also take a look and measure with https://github.com/quarkusio/qson ?

@mariofusco
Copy link
Contributor Author

@FroMage I tried to compare your implementation with mine and my impression is that, even if at a first sight they try to achieve something very similar, they also have some quite different premise and outcome, and at this point I'm afraid that they differ in many more relevant parts than the one they have in common.

As you wrote your effort is very specific to JPA entities while mine is much closer to what Jackson serialization performs out-of-the-box with the only difference that I try to do the same of what Jackson does, but without using any reflection. This difference is for instance reflected in how we process the single fields that to me are basically the FieldInfo provided by jandex while you have a richer representation that is JPA specific. In essence we also don't have the same point of view on which fields of a pojo (or entity in your case) should be serialized, not to mention on the how.

I started to sketch something similar to what @franz1981 proposed:

API for user-defined data types to be used by every extension which is interested to visit and do something while walking it, at build time

but I immediately realized that I was simply implementing something as generic as a visitor pattern for a jandex ClassInfo, which is probably a nice to have, but for sure something that belongs to jandex and that it is quite out of scope for both our current efforts.

Maybe I'm missing something and I'm very open to discuss this, but at the moment I don't see any lower level common ground between our 2 implementations. Any feedback?

@franz1981
Copy link
Contributor

I immediately realized that I was simply implementing something as generic as a visitor pattern for a jandex ClassInfo, which is probably a nice to have

exactly @mariofusco that was indeed what I was thinking - maybe @Ladicek can tell if he think this could be useful elsewhere?

@Ladicek
Copy link
Contributor

Ladicek commented Jul 31, 2024

I'm honestly not sure if a visitor over Jandex objects makes sense. Everything is in memory already, so you can just access the objects directly. I'm probably missing a lot of context, so if you have anything specific you could show, I'd be happy to take a look at it.

@FroMage
Copy link
Member

FroMage commented Jul 31, 2024

at the moment I don't see any lower level common ground between our 2 implementations. Any feedback?

Well, you're right that they don't have much in common as to how they behave. Can't argue with that. And also, we're telling people to not serialise entities to JSON, and use DTOs instead. So my use-case is pretty specific and not intended for REST APIs, which your use-case is probably targetting.

It's alright if this doesn't end up merged.

This comment has been minimized.

@@ -52,6 +52,7 @@ public Response handleParseException(WebApplicationException e) {

@GET
@Path("/person")
@Produces(MediaType.APPLICATION_JSON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were these @Produces added? They are not necessary in Quarkus and if the test now fails without them, that is a genuine issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't any test failure without that annotation, but as you can see here, I'm adding the type returned among the ones for which I generate the serializers only if that annotation is present. Should I ignore it and generate the serializer regardless?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That approach is going to be a problem because in Quarkus REST we use JSON by default. See JsonDefaultProducersHandler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I will generate the serializer regardless if the @Produces is present or not and also remove it from tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

fix object mapper generation for all primitive types

implement nested types in generated object mapper

implement SecureField annotation support

avoid generating mapper for pojo with unknown jackson annotations

implement collections serialization

fix all tests in rest-jackson module

wip

wip

refactor and simplification

performance tuning

make reflection-free serializers generation opt-in

add @produces annotation to SimpleJsonResource rest endpoints where appropriate

generate serializers for returned types from rest methods regardless if the @produces annotation is present or not

wip

add javadoc
@mariofusco
Copy link
Contributor Author

I believe that this is working reasonably well now and, also considering that it is disabled by default and it has to be explicitly enabled through the property quarkus.jackson.optimization.enable-reflection-free-serializers=true, I think that this is ready to be fully reviewed and eventually merged. Note that at the moment I only implemented the serialization part while the deserialization is still outstanding. Anyway one thing is already useful and makes sense without the other, so I'd prefer to implement the second in another pull request instead of overloading this one.

Since I learned a lot about jandex, gizmo and more in general on how to write a Quarkus extension, while developing this, I'm also thinking that it could be used as a practical example to write a blog post on these topics. /cc @holly-cummins @maxandersen

@geoand
Copy link
Contributor

geoand commented Aug 1, 2024

Very nice!

I'll make one final pass tomorrow and then if all is well I'll merge it

@holly-cummins
Copy link
Contributor

A blog would be amazing, @mariofusco.

This comment has been minimized.

@maxandersen
Copy link
Member

awesome stuff - and congrats on what I think is the longest ever quarkus feature flag name ;)

blog definitely +1000!

Copy link

quarkus-bot bot commented Aug 2, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 65712ce.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

@mariofusco
Copy link
Contributor Author

congrats on what I think is the longest ever quarkus feature flag name ;)

Achievement unlocked!!!

}

public Optional<String> create(ClassInfo classInfo) {
String beanClassName = classInfo.name().toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nitpick: this isn't really a bean

@geoand geoand merged commit 3724092 into quarkusio:main Aug 2, 2024
32 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.14 - main milestone Aug 2, 2024
@geoand
Copy link
Contributor

geoand commented Aug 2, 2024

Thanks a ton for this, this is awesome!

I see two things that need to be done as a follow up:

  • In JacksonOptimizationConfig the root needs to be changed to: quarkus.rest.jackson.optimization
  • A note in the Quarkus REST documentation should be adedd

@mariofusco
Copy link
Contributor Author

I see two things that need to be done as a follow up:

* In `JacksonOptimizationConfig` the root needs to be changed to: `quarkus.rest.jackson.optimization`

* A note in the Quarkus REST documentation should be adedd

Done with #42289

@mariofusco mariofusco deleted the gen_json_ser branch August 2, 2024 12:43
@gsmet gsmet changed the title Generate jackson serializers Generate faster reflection-free Jackson serializers Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants