Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build refactoring (to address runtime decoupling) #3831

Closed
squakez opened this issue Nov 21, 2022 · 16 comments · Fixed by #4025
Closed

Build refactoring (to address runtime decoupling) #3831

squakez opened this issue Nov 21, 2022 · 16 comments · Fixed by #4025
Assignees
Milestone

Comments

@squakez
Copy link
Contributor

squakez commented Nov 21, 2022

One of the hot topic we're discussing recently is about the possibility to decouple Camel K Runtime and have it bundled as a Camel Quarkus provider in order to remove dependency problem in the release cadence. One important thing we should address is how we're removing such dependency from the Build as right now the Camel K Operator container image is inheriting from the Mandrel image expected by the latest runtime in order to be able to perform the build.

I thought we could review the way we're building the Integrations in the following way:

image

This idea is the merge the two actual building strategy we have in place, pod and routine. In this new scenario we always schedule a kubernetes job in order to perform the build (which is nothing else than a mvn package) but we share a volume in order to keep cached the maven dependencies downloaded. Each different builder is the one which knows which container image to use in order to do the build, based on the runtime version provided in the Integration.

Advantages on adopting this model:

  • Implicit builds scalability (we can have more than 1 in parallel)
  • Possibility to run more than 1 runtime version by the same operator
  • Have more that one operator sharing the same Maven repository cache
  • No loss of cache if operator is killed/rescheduled to other nodes
  • Operator has a loose dependency on the runtime (it must know the container runtime, but it does not inherit from it)

This is a draft idea, but I'd like to have some opinion before going deep into details, I may not see some pitfall at this stage yet.

cc @lburgazzoli @astefanutti @oscerd @claudio4j @christophd @zbendhiba @tadayosi

@squakez squakez added the help wanted Extra attention is needed label Nov 21, 2022
@claudio4j
Copy link
Contributor

claudio4j commented Nov 23, 2022

This seems a good idea, and would decouple the camel-k-operator from the integration build, which is good for the operator scalability and availability.
Related to this shared volume, that is going to grow indefinitely, it would be interesting to have some sort of monitoring, also thinking when upgrading camel-k-operator, and the integrations are rebuilt, the old artifacts won't be used anymore, should they be deleted ?

@lburgazzoli
Copy link
Contributor

It seems to be a good starting point but please mind that accessing the local maven repo concurrently was an issue in the past with maven (https://issues.apache.org/jira/browse/MNG-2802)

@squakez
Copy link
Contributor Author

squakez commented Nov 24, 2022

This seems a good idea, and would decouple the camel-k-operator from the integration build, which is good for the operator scalability and availability. Related to this shared volume, that is going to grow indefinitely, it would be interesting to have some sort of monitoring, also thinking when upgrading camel-k-operator, and the integrations are rebuilt, the old artifacts won't be used anymore, should they be deleted ?

I guess some sort of monitoring should be provided anyhow. We'll dig into details for sure. For now, I am interested in collecting feedback about possible downside I am not able to see in this draft analysis.

@squakez
Copy link
Contributor Author

squakez commented Nov 24, 2022

It seems to be a good starting point but please mind that accessing the local maven repo concurrently was an issue in the past with maven (https://issues.apache.org/jira/browse/MNG-2802)

Wasn't aware of that. Definitely I'll keep that in mind. It seems it was fixed starting from Maven 3.8.2, but it is not very clear from the comments I can read on that issue. If we follow this design, we can always find a way to have builds sequentially, although we loose the ability to scale (or let the user choose a parallel building without shared volume for Maven cache).

@lburgazzoli
Copy link
Contributor

It seems to be a good starting point but please mind that accessing the local maven repo concurrently was an issue in the past with maven (https://issues.apache.org/jira/browse/MNG-2802)

Wasn't aware of that. Definitely I'll keep that in mind. It seems it was fixed starting from Maven 3.8.2, but it is not very clear from the comments I can read on that issue. If we follow this design, we can always find a way to have builds sequentially, although we loose the ability to scale (or let the user choose a parallel building without shared volume for Maven cache).

Note that scaling is also potentially limited by the image reuse strategy we have in camel-k, i.e. see #592

@lburgazzoli
Copy link
Contributor

This may also relate to:

@astefanutti
Copy link
Member

astefanutti commented Nov 24, 2022

Thanks @squakez for the detailed proposition. If I may rephrase the proposal, to better scan the solution space, it proposes to find a solution to both:

  • Have the ability to dynamically choose the tooling dependencies that are used for the builds, specifically here the bits required to support Quarkus native builds
  • While still achieving the same level of performance provided by the current "static" solution (mainly the routine build strategy)

Before jumping into the possible implementations, here are my first feedback:

  • The current architecture has emerged as a way to minimise the impact on ease / flexibility of installability, so solutions relying on persistent volumes have somehow been left aside (Kaniko with warming is an example). To decide to leverage that mechanism, I think we should review what storage classes and dynamic provisioning mechanisms are provided by the Kubernetes platforms that are supported, and the impacts on the installation modes. For examples, do Minikube or KinD support local persistant volumes, PVC / dynamic provisioning? How provisioning of persistant volumes work when installing Camel K via OLM? What are the storage classes provided by the cloud providers?

  • To really achieve true decoupling, init containers will have to be used to implement the Job, so other build steps relying on Camel K bits do not intersect with those of Quarkus.

The later point really makes me think that what we try to achieve here is the ability to "inject" build tools dynamically. (Persistant) volumes are one way to achieve this, and are somehow already used to decouple the publish strategy like Buildah and Kaniko, but there may be other ways to be doing it. Currently we do it by "image inheritance", for the JDK, Maven, and Quarkus bits. Are we sure the only way is via persistant volumes? Just to make sure we've scanned the solution space entirely :)

@astefanutti
Copy link
Member

It seems to be a good starting point but please mind that accessing the local maven repo concurrently was an issue in the past with maven (https://issues.apache.org/jira/browse/MNG-2802)

Wasn't aware of that. Definitely I'll keep that in mind. It seems it was fixed starting from Maven 3.8.2, but it is not very clear from the comments I can read on that issue. If we follow this design, we can always find a way to have builds sequentially, although we loose the ability to scale (or let the user choose a parallel building without shared volume for Maven cache).

Note that scaling is also potentially limited by the image reuse strategy we have in camel-k, i.e. see #592

Mind that we should review what concurrency safety guarantees MNG-2802 provide, here we are talking about accessing the same local Maven repository via different Maven instances, executed on different machines.

@squakez
Copy link
Contributor Author

squakez commented Nov 24, 2022

This may also relate to:

* [Improve container image layering #593](https://github.com/apache/camel-k/issues/593)

* [jib builder #1656](https://github.com/apache/camel-k/issues/1656)

Yeah, I had this in my radar. I guess what we use for the Builder will be an implementation detail (ideally we can reuse what we have in a first poc and move iteratively to something new at a later stage).

@lburgazzoli
Copy link
Contributor

Another related issue:

@squakez
Copy link
Contributor Author

squakez commented Nov 24, 2022

@astefanutti the main driver for this proposal is the fact we need to decouple the operator from the runtime.

Currently we do it by "image inheritance", for the JDK, Maven, and Quarkus bits. Are we sure the only way is via persistant volumes? Just to make sure we've scanned the solution space entirely :)

This draft design is a first idea on how to possibly decouple. We can brainstorm any further idea and see if it's valid for sure. More than a persistent volume we should talk generically as "Maven repo" to avoid to reason on implementation details.

@astefanutti
Copy link
Member

@astefanutti the main driver for this proposal is the fact we need to decouple the operator from the runtime.

Currently we do it by "image inheritance", for the JDK, Maven, and Quarkus bits. Are we sure the only way is via persistant volumes? Just to make sure we've scanned the solution space entirely :)

This draft design is a first idea on how to possibly decouple. We can brainstorm any further idea and see if it's valid for sure. More than a persistent volume we should talk generically as "Maven repo" to avoid to reason on implementation details.

@squakez yes, it occurred to me that particular issue was about what you've stated in the description:

One important thing we should address is how we're removing such dependency from the Build as right now the Camel K Operator container image is inheriting from the Mandrel image expected by the latest runtime in order to be able to perform the build.

@squakez
Copy link
Contributor Author

squakez commented Nov 24, 2022

Another related issue:

* [Tekton build strategy #682](https://github.com/apache/camel-k/issues/682)

About this one, do you think it would be wise to rely on an additional operator for a core part of the project? If we go that path, aren't we creating a strong dependency on some other stuff? I mean, we go in the direction to remove a dependency (camel k runtime) and we marry another one.

@lburgazzoli
Copy link
Contributor

Another related issue:

* [Tekton build strategy #682](https://github.com/apache/camel-k/issues/682)

About this one, do you think it would be wise to rely on an additional operator for a core part of the project? If we go that path, aren't we creating a strong dependency on some other stuff? I mean, we go in the direction to remove a dependency (camel k runtime) and we marry another one.

I don't think we should require to have tekton installed hence we should have a strategy that work on a vanilla kube, but it can be an optional strategy we can use as it is not uncommon to have build be be required to go through a pipeline i.e. for security checks.

@squakez squakez removed the help wanted Extra attention is needed label Jan 2, 2023
@squakez squakez self-assigned this Jan 2, 2023
@squakez squakez added this to the 2.0.0 milestone Feb 2, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 13, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 13, 2023
@squakez
Copy link
Contributor Author

squakez commented Feb 13, 2023

I made some experiment around the possibility to run parallel pod builds: the results are quite encouraging as there does not seem to be any kind of corruption of dependencies when running in parallel. Using a PVC in ReadWriteOnce or ReadWriteMany mode looks to be enough for several Maven process in distinct Pods to work. It is clear that the more builder we have, the slower is the time to finish the build, but we should be able to mitigate this by working on some queuing system as suggested in #592

squakez added a commit to squakez/camel-k that referenced this issue Feb 14, 2023
@gansheer gansheer mentioned this issue Feb 15, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 17, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 20, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 27, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 28, 2023
squakez added a commit to squakez/camel-k that referenced this issue Feb 28, 2023
squakez added a commit to squakez/camel-k that referenced this issue Mar 2, 2023
squakez added a commit to squakez/camel-k that referenced this issue Mar 6, 2023
@squakez
Copy link
Contributor Author

squakez commented Mar 6, 2023

The only missing bit is to understand how to provide the PVC automatically for OLM installation mode: operator-framework/operator-lifecycle-manager#2828

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants