Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Register fields for reflection in kubernetes-client #42028

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

jorsol
Copy link
Contributor

@jorsol jorsol commented Jul 22, 2024

Copy link

quarkus-bot bot commented Jul 22, 2024

Thanks for your pull request!

Your pull request does not follow our editorial rules. Could you have a look?

  • description should not be empty, describe your intent or provide links to the issues this PR is fixing (using Fixes #NNNNN) or changelogs

This message is automatically generated by a bot.

@geoand
Copy link
Contributor

geoand commented Jul 22, 2024

I can't say I understand this, can you explain a little?

Thanks

@jorsol jorsol force-pushed the 39934-withFieldsRegistration branch 2 times, most recently from 7f9b0e4 to 2795649 Compare July 22, 2024 09:14
@jorsol
Copy link
Contributor Author

jorsol commented Jul 22, 2024

I can't say I understand this, can you explain a little?

Sure, long story short: Jackson uses reflection to search fields #39934 (comment):
https://github.com/FasterXML/jackson-databind/blob/jackson-databind-2.17.2/src/main/java/com/fasterxml/jackson/databind/introspect/AnnotatedFieldCollector.java

private Map<String,FieldBuilder> _findFields(...) {
    for (Field f : cls.getDeclaredFields()) { // <- reflection alert!
...

Kubernetes Client model has been annotated with JsonInclude on the fields but not on the methods triggering issue #39934

This PR registers all Kubernetes resources methods and fields for reflection potentially fixing the root issue.

@geoand
Copy link
Contributor

geoand commented Jul 22, 2024

Thanks.

I'll leave this to @manusa and @metacosm to review

@jorsol jorsol marked this pull request as ready for review July 22, 2024 12:12
@manusa
Copy link
Contributor

manusa commented Jul 22, 2024

This PR registers all Kubernetes resources methods and fields for reflection potentially fixing the root issue.

I'm not sure what the initial reasons were to restrict the model registration to only a given set and to exclude fields.

This seems like a good temporary solution, but still I think that what @galderz said originally would make more sense quarkus-wise:

The proper fix IMO would be to stop using reflection for discovering annotations. This would reduce binary executables, make native image builds faster and you would avoid issues like this because the jandex index has everything.

Related issues/PRs these changes affect:

@jorsol
Copy link
Contributor Author

jorsol commented Jul 22, 2024

The proper fix IMO would be to stop using reflection for discovering annotations. This would reduce binary executables, make native image builds faster and you would avoid issues like this because the jandex index has everything.

Yes, that would be the proper way to fix it, but the thing is that is not Quarkus who uses reflection, is Jackson from here:
https://github.com/FasterXML/jackson-databind/blob/091d968751ef00150d22a788fe7f45b7cdcb337a/src/main/java/com/fasterxml/jackson/databind/introspect/AnnotatedFieldCollector.java#L73

So, while the correct approach would be to use Jandex, that probably should be done on jackson-databind itself.

@gsmet
Copy link
Member

gsmet commented Jul 22, 2024

There has been a lot of discussion about the Kubernetes Client having a massive overhead when we were not careful about reflection registration so I think we need to be very careful about this (not saying it shouldn't go in but we need to measure the impact in terms of image size before we think about merging it).

@manusa
Copy link
Contributor

manusa commented Jul 22, 2024

So, while the correct approach would be to use Jandex, that probably should be done on jackson-databind itself.

Agreed, hence my "This seems like a good temporary solution" statement :) I just wanted to keep on the record that the other task/issue should not be forgotten.

@jorsol
Copy link
Contributor Author

jorsol commented Jul 22, 2024

There has been a lot of discussion about the Kubernetes Client having a massive overhead when we were not careful about reflection registration so I think we need to be very careful about this (not saying it shouldn't go in but we need to measure the impact in terms of image size before we think about merging it).

3.13.0.CR1, final file size 169M:

   29,093 reachable types   (86.3% of   33,710 total)
   45,578 reachable fields  (67.7% of   67,339 total)
  186,602 reachable methods (65.7% of  284,009 total)
   10,082 types, 2,974 fields, and 40,567 methods registered for reflection
       62 types,    67 fields, and    55 methods registered for JNI access
        4 native libraries: dl, pthread, rt, z

This patch, final file size 170M:

   29,093 reachable types   (86.3% of   33,710 total)
   45,583 reachable fields  (67.7% of   67,342 total)
  186,603 reachable methods (65.7% of  284,008 total)
   10,082 types, 8,531 fields, and 40,567 methods registered for reflection
       62 types,    67 fields, and    55 methods registered for JNI access
        4 native libraries: dl, pthread, rt, z

There is a 1 MB overhead which is not bad at all. But please don't take my word for granted, anyone could test this to ensure there are no drawbacks.

@geoand
Copy link
Contributor

geoand commented Jul 23, 2024

At some point we really really have to look at the Kubernetes binary size issue (due to Jackson) in a proper way.

cc @xstefank who is now going to working on the JOSDK (which is a heavy user of the client) and @mariofusco who has done a lot of work around Jackson

This comment has been minimized.

@gsmet
Copy link
Member

gsmet commented Jul 23, 2024

At some point we really really have to look at the Kubernetes binary size issue (due to Jackson) in a proper way.

Apart from being able to identify which areas of the client are used, I don't see a lot of options.
Because even if we end up generating code for JSON serialization, we would have to generate a massive amount of code if we have to support the whole client.

@geoand
Copy link
Contributor

geoand commented Jul 23, 2024

True as that may be, the fact that we are registering so much stuff for reflection AFAIU, is killing GraalVMs ability to do dead code elimination

@geoand
Copy link
Contributor

geoand commented Jul 23, 2024

We should also explore what more can be done on the client side to improve things

@mariofusco
Copy link
Contributor

@mariofusco who has done a lot of work around Jackson

I'm making good progresses with my PoC trying to generate ObjectMappers at build time via gizmo, thus having the advantage of avoiding any reflection and consequently also speeding up the marshalling process. I'm hoping to complete this task and have something ready for review in the next few days.

That said, I'm not sure if the work that I'm doing could be also applied to this specific issue, but my understanding is (please correct me if I'm wrong) that here we need to convert in json some specific pojos that are mapping the kubernetes configuration. If that's true couldn't we also develop some ad-hoc ObjectMappers (if the pojos are known upfront we don't need any code generation) for those classes instead of relying on out-of-the-box reflection-based Jackson marshalling?

@geoand
Copy link
Contributor

geoand commented Jul 23, 2024

but my understanding is (please correct me if I'm wrong) that here we need to convert in json some specific pojos that are mapping the kubernetes configuration.

I'm not the SME (@manusa is), but this is my understanding as well

@jorsol
Copy link
Contributor Author

jorsol commented Jul 23, 2024

To clarify the current issue:

kubernetes-client model contains classes with Jackson annotations, and on previous versions of KC the fields had an additional annotation (@JsonInclude) that the methods don't:

    @JsonProperty("labels")
    @JsonInclude(JsonInclude.Include.NON_EMPTY)
    private Map<String, String> labels = new LinkedHashMap<String, String>();
    
    @JsonProperty("labels")
    public Map<String, String> getLabels() {
        return labels;
    }

    @JsonProperty("labels")
    public void setLabels(Map<String, String> labels) {
        this.labels = labels;
    }

Since the fields were not registered, the @JsonInclude was not evaluated causing a different/wrong behavior in native-image.

If the reflection can be avoided that would be great, but for the time being, this change only registers correctly the fields of the KC model.

@manusa
Copy link
Contributor

manusa commented Jul 23, 2024

There are several places in the client where deserialization (json-2-Pojo) takes place:

  • Reading HTTP responses from API server
  • Loading resources from storage
  • Reading the Kubernetes configuration (.kube/config et al)

The KubernetesDeserializer takes care of deserializing JSON nodes to the required target types (via reflection) as long as there's a model artifact that contains the desired classes.

However, long time ago we introduced the GenericKubernetesResource which is the fallback type in case no specific class is registered or available to deserialize the JSON node.

This means that if you application is not doing any type-specific operation, it might just work by registering GenericKubernetesResource for reflection (+ those specific the .kube/config, etc.).
An example of this kind of application can be something that reads a bunch of YAMLs and deploys them to Kubernetes.

One of the problems is that the Kubernetes Client Quarkus extension is too broad in scope. Plus the io.fabric8:kubernetes-client is also too broad and includes transitive dependencies for all of the model modules.

I still have a pending task to create a blog post showcasing how to create Quarkus Kubernetes app with a limited set of models. But the thing is that users are the ones who know in advance what types their application will be needing.

One of the things I think might be hard to solve (but maybe I'm wrong and GraalVM/Quarkus can actually take care of this) is something like:

KubernetesResource resource = kubernetesClient.resource($inputStream).item();
if (resource instanceof Service) {
  System.out.println("service");
} else {
  System.out.println("not-service");
}

In this case, if the io.fabric8:kubernetes-model-core artifact is available at runtime and Service is registered for reflection, then the output will be service. Otherwise, the stream will be deserialized as a GenericKubernetesResource and the output will be not-service.

@jorsol jorsol force-pushed the 39934-withFieldsRegistration branch from 2795649 to 01afb82 Compare July 23, 2024 12:26
@metacosm
Copy link
Contributor

The proper fix IMO would be to stop using reflection for discovering annotations. This would reduce binary executables, make native image builds faster and you would avoid issues like this because the jandex index has everything.

Yes, that would be the proper way to fix it, but the thing is that is not Quarkus who uses reflection, is Jackson from here: https://github.com/FasterXML/jackson-databind/blob/091d968751ef00150d22a788fe7f45b7cdcb337a/src/main/java/com/fasterxml/jackson/databind/introspect/AnnotatedFieldCollector.java#L73

So, while the correct approach would be to use Jandex, that probably should be done on jackson-databind itself.

maybe it might be worth investigating whether we could have a Jandex-based collector in Jackson?

@metacosm
Copy link
Contributor

There are several places in the client where deserialization (json-2-Pojo) takes place:

* Reading HTTP responses from API server

* Loading resources from storage

* Reading the Kubernetes configuration (`.kube/config` et al)

The KubernetesDeserializer takes care of deserializing JSON nodes to the required target types (via reflection) as long as there's a model artifact that contains the desired classes.

However, long time ago we introduced the GenericKubernetesResource which is the fallback type in case no specific class is registered or available to deserialize the JSON node.

This means that if you application is not doing any type-specific operation, it might just work by registering GenericKubernetesResource for reflection (+ those specific the .kube/config, etc.). An example of this kind of application can be something that reads a bunch of YAMLs and deploys them to Kubernetes.

One of the problems is that the Kubernetes Client Quarkus extension is too broad in scope. Plus the io.fabric8:kubernetes-client is also too broad and includes transitive dependencies for all of the model modules.

I still have a pending task to create a blog post showcasing how to create Quarkus Kubernetes app with a limited set of models. But the thing is that users are the ones who know in advance what types their application will be needing.

One of the things I think might be hard to solve (but maybe I'm wrong and GraalVM/Quarkus can actually take care of this) is something like:

KubernetesResource resource = kubernetesClient.resource($inputStream).item();
if (resource instanceof Service) {
  System.out.println("service");
} else {
  System.out.println("not-service");
}

In this case, if the io.fabric8:kubernetes-model-core artifact is available at runtime and Service is registered for reflection, then the output will be service. Otherwise, the stream will be deserialized as a GenericKubernetesResource and the output will be not-service.

Wild idea (because this would change a lot of how the client work currently): could we keep the current model API but have the objects backed by a generic resource. Instead of all model objects being generated individually with appropriate fields, we would just getting the accessors, which would take care of accessing the appropriate key-value in the underlying map?

Maybe that would simplify the serialization aspects? That said, this might not help size in memory or performance (since you'd have to access a map instead of a direct field access)…

@jorsol
Copy link
Contributor Author

jorsol commented Jul 24, 2024

maybe it might be worth investigating whether we could have a Jandex-based collector in Jackson?

Jackson might need to allow both collectors, the Reflection-based and the Jandex-based collector, and be configurable since not everyone could use Jandex.


But, can we keep focus on the current issue here? I agree is important to improve and reduce reflection in most libraries used by Quarkus, but right now the issue might be solved by just registering the fields with an overhead of just 1MB in the final size of the binary.

@geoand
Copy link
Contributor

geoand commented Jul 24, 2024

I agree is important to improve and reduce reflection in most libraries used by Quarkus

For us, the current situation is pretty bad, that's why we are having the discussion about potential improvements. If it doesn't happen now, it will slip yet again...

In any case, the PR itself makes sense but we need a real solution soon

@geoand geoand requested a review from zakkak July 24, 2024 13:18
@manusa
Copy link
Contributor

manusa commented Jul 24, 2024

Wild idea (because this would change a lot of how the client work currently): could we keep the current model API but have the objects backed by a generic resource. Instead of all model objects being generated individually with appropriate fields, we would just getting the accessors, which would take care of accessing the appropriate key-value in the underlying map?

Maybe we can discuss this during a call. This might get complicated especially when trying to deal with deserialization of non-standard fields, etc. Also hard for the sundrio builder-generation integration.

@jorsol
Copy link
Contributor Author

jorsol commented Jul 24, 2024

To follow up the discussion: #42110

This comment has been minimized.

@jorsol jorsol force-pushed the 39934-withFieldsRegistration branch from 01afb82 to 72396b7 Compare July 24, 2024 14:40
@mariofusco
Copy link
Contributor

Wild idea (because this would change a lot of how the client work currently): could we keep the current model API but have the objects backed by a generic resource. Instead of all model objects being generated individually with appropriate fields, we would just getting the accessors, which would take care of accessing the appropriate key-value in the underlying map?

Maybe we can discuss this during a call. This might get complicated especially when trying to deal with deserialization of non-standard fields, etc. Also hard for the sundrio builder-generation integration.

I would love to also discuss this in a call, since it could even be related to the work that I'm doing for the PoC that I mentioned in one of my former comments.

@galderz
Copy link
Member

galderz commented Jul 25, 2024

To clarify the current issue:

kubernetes-client model contains classes with Jackson annotations, and on previous versions of KC the fields had an additional annotation (@JsonInclude) that the methods don't:

    @JsonProperty("labels")
    @JsonInclude(JsonInclude.Include.NON_EMPTY)
    private Map<String, String> labels = new LinkedHashMap<String, String>();
    
    @JsonProperty("labels")
    public Map<String, String> getLabels() {
        return labels;
    }

    @JsonProperty("labels")
    public void setLabels(Map<String, String> labels) {
        this.labels = labels;
    }

Since the fields were not registered, the @JsonInclude was not evaluated causing a different/wrong behavior in native-image.

If the reflection can be avoided that would be great, but for the time being, this change only registers correctly the fields of the KC model.

Note that there's an alternative solution here, which is that methods always have the same annotations as fields. Then you "just" have to register methods for reflection and you can ignore fields. But if there's functionality that relies on annotations that are only in fields then it would break.

@manusa
Copy link
Contributor

manusa commented Jul 25, 2024

Note that there's an alternative solution here, which is that methods always have the same annotations as fields. Then you "just" have to register methods for reflection and you can ignore fields. But if there's functionality that relies on annotations that are only in fields then it would break.

This will be definitely fixed when 6.13.2 is released (probably on 2024-08-05).

However, users who generate their own types and don't annotate them uniformly might still run into the issue.

I would love to also discuss this in a call, since it could even be related to the work that I'm doing for the PoC that I mentioned in one of my former comments.

I created a discussion on the Fabric8 Kubernetes Client repo to follow up on this and avoid adding more noise to the PR: fabric8io/kubernetes-client#6199

Copy link
Contributor

@zakkak zakkak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what has been said so far. As far as I am concerned the PR looks OK as an immediate solution.

FWIW the size increase is indeed not huge, but still noticeable.

In kubernetes-client IT I see an increase of ~.5MB in the image size (or 0.3% of the total size and 0.8% of the heap size) which is the result of registering 10973 more fields for reflection.

In openshift-client IT I see a similar increase of ~.5MB in the image size (or 0.4% of the total size and 0.9% of the heap size) which is the result of registering 11635 more fields for reflection.

This comment has been minimized.

@jorsol jorsol force-pushed the 39934-withFieldsRegistration branch from 72396b7 to 969351a Compare July 25, 2024 08:29
@geoand geoand added the triage/waiting-for-ci Ready to merge when CI successfully finishes label Jul 25, 2024
@gsmet
Copy link
Member

gsmet commented Jul 25, 2024

Yeah same here, given the current situation, we can pay the price for it.

Copy link

quarkus-bot bot commented Jul 25, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 969351a.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

@zakkak zakkak merged commit d283bde into quarkusio:main Jul 25, 2024
22 checks passed
@quarkus-bot quarkus-bot bot added kind/bugfix and removed triage/waiting-for-ci Ready to merge when CI successfully finishes labels Jul 25, 2024
@quarkus-bot quarkus-bot bot added this to the 3.14 - main milestone Jul 25, 2024
@jorsol jorsol deleted the 39934-withFieldsRegistration branch July 25, 2024 11:50
@gsmet gsmet modified the milestones: 3.14 - main, 3.13.1 Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants