Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research: Support automatic application discovery #517

Closed
2 tasks
Tracked by #514
odubajDT opened this issue Dec 14, 2022 · 2 comments
Closed
2 tasks
Tracked by #514

Research: Support automatic application discovery #517

odubajDT opened this issue Dec 14, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@odubajDT
Copy link
Contributor

odubajDT commented Dec 14, 2022

Part of #514

Goal:

Implement a proof of concept for support of automatic discovery of the application, where a multi-service application will not require a user-defined KeptnApp resource. Instead this resource will be generated automatically (as it currently works for a single-service applications). When and by which component it will be generated is not clear yet.

Most typical use-case:

The typical usage of this feature would be to take an existing manifests and apply them to the cluster without modification (no keptn annotations, no pre/post deployment tasks/evaluations, no additional Keptn resources). Therefore we assume that the Workloads have no pre-flight and post-flight checks and we are speaking about raw deployment here. The reason why the user wants to have automatic application discovery are observability data, which in this case should be provided out of the box.

Open questions:

  • When is the best time to generate KeptnApp resource ?
  • Which component should take responsibility for the generation ? Is it possible to follow the already-existing pattern and add this functionality to the mutating webhook or do we need an additional resource and a controller to take this responsibility ?
  • How to deal with Workload(Deployment/Pod/RS/SS/DS) resources created in a bigger time slot? For example not applying all manifests of a single application at once, but one after another.
  • How do we need to adapt Workload/WorkloadInstance controllers to support this feature ? Do we need to adapt them ?
  • Will generating the KeptnApp result in breaking span structure ? Are we able to keep it consistent ?
  • Would it be possible to support restartability of an application in this case ?
  • What version should the KeptnApp have ? Should it be generated from the versions of the Workloads, or revisions ?
  • What happens if you apply a workload with v1, and before the timeout with v2? instead, what is v2 is applied after the timeout?

Initial thoughts:

Adding functionality and responsibility to generate KeptnApp to the webhook seems now as the easiest and most straight-forward solution. This solution will on the other hand require multiple updates of the KeptnApp where we will be forced to update this resource multiple times and iteratively add all workloads, which are part of the application. The problem here is that the manifests (Deployment/Pod/RS/SS/DS) can be created in a bigger time range which will therefore lead to not updating only KeptnApp, but also KeptnAppVersion and possibly Workload and WorkloadInstance resources as well. This can lead to the need of adding additional logic the already existing components.

Another possible approach would be to introduce a new CRD called KeptnAppCreationRequest with a controller, which will take care of the functionality of generating the KeptnApp without interfering the other components. KeptnAppCreationRequest will be created by webhook when the first pod of the application will be created. In the KeptnAppCreationRequest there will be in information about the latest modification of this resource, namespace and KeptnApp name (name available from the annotations in the Pods). Responsibility of the controller:

  • KeptnAppCreationRequest controller will actively search for incoming pods of the same application, if it will find a new one, it will modify itself and restart the timeout for waiting for new resources.
  • If the controller will find an already exisitng KeptnApp for the application (in the scenario that the KeptnApp was applied after the manifests), it will remove KeptnAppCreationRequest and do nothing.
  • If the resource will not be modified (no more manifests of the application were applied) for a certain amount of time (let's say 30-60s), it will generate a KeptnApp resource, what will launch the already existing process of deployment.
  • Additional manifests (Workloads) created after timeout won't be considered as part of this application and will be ignored (known limitation).

With this approach we are able to keep the existing functionality of other components and have this functionality of automatic discovery isolated in a separate controller just with minor changes in mutating webhook.

Other possible solutions are welcome and would be awesome to have them documented as part of this ticket.

DoD:

  • PoC to support automatic application discovery
  • Answer the questions
@odubajDT odubajDT added the enhancement New feature or request label Dec 14, 2022
@odubajDT odubajDT added this to the 0.5.x milestone Dec 14, 2022
@odubajDT odubajDT moved this to 🏗 Shaping in Keptn Lifecycle Toolkit Dec 14, 2022
@thisthat thisthat moved this from 🏗 Shaping to 🎟️ Refined in Keptn Lifecycle Toolkit Dec 20, 2022
@bacherfl bacherfl self-assigned this Jan 3, 2023
@bacherfl bacherfl moved this from 🎟️ Refined to 🏃 In progress in Keptn Lifecycle Toolkit Jan 4, 2023
@bacherfl
Copy link
Member

bacherfl commented Jan 4, 2023

A first draft of this concept is implemented in #559

Here are my findings so far:

  • When is the best time to generate a KeptnApp resource?
    • In the KeptnAppCreationRequest, which is created by the webhook
    • No distinction between single and multi service deployment required in webhook anymore
  • How to handle resources not being applied all in one?
    • DiscoveryDeadline (~30s) in KeptnAppCreationRequest. Only if no user-defined app is found within that timeframe, the App will be created automatically
    • After the expiration of this timeframe, the KeptnApp will be created automatically with all Workloads being present at this point in time being part of the created KeptnApp revision
  • Do we need to adapt KeptnWorkloadInstances?
    • No
  • Will generating the KeptnApp result in breaking span structure?
    • No. The root span is created by the KeptnApp controller, when creating a new version/revision of a KeptnAppVersion - that stays the same with this approach
  • Would it be possible to support restartability of an application in this case?
    • Yes
  • What version should the KeptnApp have? Should it be generated from the versions of the Workloads, or revisions?
    • Most straight forward way would be to start with 1.0.0 for auto-created KeptnApps
  • What happens if you apply a workload with v1, and before the timeout with v2? What happens if v2 is applied after the timeout?
    • Before the timeout: After the discovery deadline has expired each of the currently present workloads will be added to the app. If there are multiple versions of the same workload, the latest one will be added.
    • After the timeout: A change in the version of a Workload could be interpreted as a change in the overall AppVersion. In this case, a new KeptnAppVersion will be created. Noteworthy: Currently, Only the new workload will be displayed in OTel trace of new KeptnApp revision, all other workloads will, if unchanged be still be part of the previous appVersion revision

@bacherfl bacherfl moved this from 🏃 In progress to ✅ Done in Keptn Lifecycle Toolkit Jan 5, 2023
@thisthat thisthat modified the milestones: 0.5.x, 0.6 Jan 12, 2023
@odubajDT
Copy link
Contributor Author

Research has been done, closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

3 participants