Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The java generator produces a new for each type occurrence #5619

Closed
iocanel opened this issue Nov 28, 2023 · 12 comments
Closed

The java generator produces a new for each type occurrence #5619

iocanel opened this issue Nov 28, 2023 · 12 comments

Comments

@iocanel
Copy link
Member

iocanel commented Nov 28, 2023

Describe the bug

It's not exactly a bug, it's more of a usability thing.

Currently, the java generator will produce a new class for each occurrence of a type in the CRD schema. This means that types that are frequently referenced, e.g. metadata, selector and more are added multiple times.

One obvious drawback is the size of the generated model (e.g. for kubevrit I got a model of around 3K classes).

Another issue, is that this breaks the visitor concept, since a single visitor is meant to visit all occurrences of type in the hierarchy.

What we could possibly do, is keep track of the name and structure of the objects we create and allow conditional reuse. I could see that working with a config fragment like:

<resusableObjects>
   <metadata>io.fabric8.kubernetes.model.ObjectMeta</metadata>
   <selector>io.fabric8.kubernetes.model.Selector</selector>
   <datavolume>io.quarkiverse.kubevirt.blah.blah</datavolume>
</reusableObjects>

This is just something that just come into my head, there are probably even better ways to approach this.

Fabric8 Kubernetes Client version

SNAPSHOT

Steps to reproduce

  1. checkout: https://github.com/iocanel/quarkus-kubevirt-client
  2. build with mvn clean install -Pgenerate-model
  3. check the generated model

Expected behavior

Types being reused

Runtime

Kubernetes (vanilla)

Kubernetes API Server version

1.25.3@latest

Environment

Linux

Fabric8 Kubernetes Client Logs

No response

Additional context

No response

@andreaTP
Copy link
Member

Hi @iocanel , and thanks for opening this super interesting discussion.
I have been tinkering around those things for a while, but couldn't get convinced about one possible implementation and ended up delaying it, thanks for bumping it up!

The big show-stopper for me has been forward compatibility, imagine the following scenario.
V1:

  • class1 -> class1.metadata1
  • class2 -> class2.metadata2
    the 2 metadata fields are identical and we squash them into one object, so it becomes:
  • class1 -> root.metadata
  • class2 -> root.metadata

This is nice from the deduplication POV, code size etc. etc.
But we don't have control over the shape of metadata, as it's defined by the user, and, for the next V1 release they can add a field to metadata2 and not to metadata1 which is a fair non-breaking change.
Unfortunately, we end up being unable to "refactor" the metadata classes, causing a breaking generated code change (package names etc. etc.).

Your proposal is interesting, as it means that the user is still in control of "what to reuse", although it potentially has the same drawback explained above.

For completeness, at some point, I have prototyped a Maven plugin to do the refactoring as a separate step:
https://github.com/andreaTP/poc-class-dedup
Not sure if it still works ... but a more "configurable" and generic version of it might provide the functionality.

wdyt?

@iocanel
Copy link
Member Author

iocanel commented Nov 28, 2023

More thoughts ...

I think that the are three kinds of duplicates:

  1. core kubernetes types
  2. cross apiVersion
  3. cross apiGroup (but not core).

The simplest case to deal with is 2 as in most cases we want no reuse.
What in comes to types coming from the core kubernetes model (case 1), we could possibly reuse them, as we already do with the buildable references (we use a curated list classes that we consider reusable).

Which brings us down to cross apiGroup (case 3) and beyond the known core model classes. I think that the safest thing to do is to do nothing out of the box but provide users with the ability to define what is meant to be reused via configuration.

@andreaTP
Copy link
Member

What in comes to types coming from the core kubernetes model (case 1), we could possibly reuse them, as we already do with the buildable references (we use a curated list classes that we consider reusable).

This is interesting, as it's valid reasoning, but based on assumptions.
More often than not the CRD is generated by something like kubebuilder and we don't have any guarantee that the object "backing" the Json schema is the "core Kuberentes type" or a slightly modified version of it, or even something completely different but with a matching shape.

I like the idea of leaving this decision to the user though, and this is probably the deal-breaker for implementing this feature.

@iocanel
Copy link
Member Author

iocanel commented Nov 28, 2023

I think that when it comes to core types, it safe enough to assume that if name (e.g. metadata) matches and shape matches then reuse. I really doubt that anyone would replicate them and not reuse them instead. Again, I am not talking about any type, but a handpicked set (like we do for BuildableReferences). As long as there is a way for the user to override our defaults I think it should be fine.

@andreaTP
Copy link
Member

I'm adding an example I'm aware of: https://github.com/Apicurio/apicurio-registry-operator/blob/94d81decf63b5692bde34a35d8a612caff43933f/api/v1/apicurioregistry_types.go#L291

But I agree with your reasoning @iocanel , I think that the question becomes, should we embed this functionality in the java-generator itself or try a different approach with an additional plugin?

@iocanel
Copy link
Member Author

iocanel commented Nov 28, 2023

But I agree with your reasoning @iocanel , I think that the question becomes, should we embed this functionality in the java-generator itself or try a different approach with an additional plugin?

I think it needs to be in the java-generator else we won't be able to use the contextual info.

@andreaTP
Copy link
Member

we won't be able to use the contextual info

I think that we need only class names and shapes, and configuring it with the "original" kubernetes yaml/json seems appealing. That said, are you gonna take a stab at it?

@iocanel
Copy link
Member Author

iocanel commented Dec 1, 2023

I think that we need only class names and shapes, and configuring it with the "original" kubernetes yaml/json seems appealing. That said, are you gonna take a stab at it?

Unfortunately, I don't have the time needed.

@iocanel
Copy link
Member Author

iocanel commented Dec 1, 2023

Something interesting I encountered while playing with kubevirt. I have 17 different implementations for Secret. Not sure if all of them could be replaced by upstream Secret, but my guess is that they could.

@andreaTP
Copy link
Member

andreaTP commented Dec 4, 2023

Hey @iocanel ! Sorry for the slow response, at the moment I'm pretty busy too 🙁
To help this issue move forward what I can commit to do is to try to apply the experimental Maven Plugin: https://github.com/andreaTP/poc-class-dedup
to your kubevirt project with an "allowList" for classes to be deduplicated.

This way, we can start experimenting with the feature on a real-world use-case.
Thoughts?

Copy link

stale bot commented Mar 4, 2024

This issue has been automatically marked as stale because it has not had any activity since 90 days. It will be closed if no further activity occurs within 7 days. Thank you for your contributions!

@andreaTP
Copy link
Member

andreaTP commented Apr 9, 2024

Fixed in #5844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants