-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application dependencies #7437
Comments
I'm glad to see this gaining traction again. From previous discussions, we thought that the sync retry feature would solve this problem in a more declarative way (e.g. reconcile as long as necessary, hoping for dependencies to have finished reconciling in a certain time frame). I think we could build up upon the existing PoC code, however I think we should consider some more things than are currently implemented in the PoC:
And probably some more things I have somewhere in the back of my mind from when I came up with the PoC. |
Yes, what I now realize is that retries don't help because in the problematic scenario (mutating webhooks), nothing actually "fails" per se and so there is nothing to retry. The dependent application silently succeeds even though it didn't get injected properly.
I love your ideas on making this even more powerful with labels and force sync. But for MVP, we can keep this quite simple, not very far removed from your PoC. The way I think this feature should work is:
I took a look at your work, and I believe you implemented it just like how I described it.
I think this is more than we need, a simple message in the operation would be sufficient to understand what's going on. |
This is a blocker for us and makes us to put lot of efforts between the dependency applications. Can we get an update on this ?. |
Hi team, is there a way to use dependencies between yaml files within the same Application? |
bump; same issues. |
1 similar comment
bump; same issues. |
Just adding my "bump" here. This is mainly because I would also like this with ApplicationSets as I stated in issue #221 |
I've opened a PR showing a possible implementation path (which needs some work). If the dependency work is close to completion, I believe it could replace the user defined rollout stages in my PR. |
same here |
We would love to see this feature as well ! 👍🏻 |
Adding my "bump". EDIT:
|
Adding my bump |
Thanks for the |
Also have this requirement of Apps based on Apps and so on... same use-case Application B depends on A. |
My use case will be on a cluster bootstrap we have istiod and istio-ingressgateway deployed as independent applications but the latter fails to sync as the mutating webhook of the first was not ready when it was deployed. |
My use cases are: I can use sync waves and App of App hierarchies to get everything to deploy in the right order when I bootstrap a cluster, but just having a property on the Application that says it is dependent on one or more other Applications seems MUCH easier to manage. Let ArgoCD figure out the order based on the dependency info! |
It looks like this is being tracked on the roadmap in this issue. Please go upvote! |
I would really like to see this feature added! We are using jobs with sync-waves/hooks to get this functionality. While it works, it can be cumbersome to implement/debug especially when you're putting these hooks in across 10+ applications. Having the ability to clearly define the dependencies between the applications would be awesome! Just as an example of our deployment scenario (there are other components to this, but the flow is the similar):
|
I agree, that addressing clear-cut use cases with retries, app-of-apps, and progressive syncs is practical. Providing clear and documented examples summarizing the many slack, discussions, and issues would be very user-friendly and possibly help justify the need for a |
Agreed, maybe the best way to demonstrate the need for |
I've done a refresh of issues under the sync-waves label. https://github.com/argoproj/argo-cd/issues?q=is%3Aopen+is%3Aissue+label%3Async-waves+sort%3Areactions-%2B1-desc |
I'm not 100 sure if that's true. I did see apps in a specific wave go into a failed state and argo would switch to a next wave anyway |
@kotyara85 that sounds like a bug, not a fundamental issue with sync waves. If you can reproduce it, would be great to have a new issue opened for that. |
Hi @crenshaw-dev, within the context of "retries solve this" my only feedback is that many of my apps don't need retries. And, introducing retries might make it more difficult for me to spot problems with a deployment. I can see myself not catching an issue that is a show-stopper vs just assuming that a bunch of apps are just retrying indefinitely, until a line of other apps finish their own retries and eventually turn green. |
@nathan-bowman I would assume that retries would be configurable per application or even per resource because argocd can't know what is an acceptable retry. |
To be specific, retries solve the case of "thing A must exist before thing B can be successfully applied to the cluster." Retries don't solve all the cases described above.
+1 to what @rouke-broersma said: retries are configurable on a per-app basis.
That's a retry config / alerting issue. You can tune your alert to fire on the first failure, after N failures, after some timeout, etc. You'll definitely want to configure a max retry count or timeout. Both these problems also apply to a |
@rouke-broersma This is sort of my heartburn with going the "retries" route. I have apps that should never have to retry. And if I go down this path I'm now saddled with sort of trying to always hit a moving target of "how many retries and for how long" for each app. |
I'm just not sure how the task of configuring appropriate retries is significantly more challenging than figuring out which apps to configure with |
For me personally, telling an app "wait until this other app(s) is healthy before you deploy" is vastly different from "now I need to make a bunch of apps retry for 5 minutes longer since I added this other app in the chain of requirements". Hopefully that makes sense... |
Sure, I can understand how the experience would suffer if you need to configure, say, 50 apps with retries (which may not even be useful after the initial deployment). In that case, I think an app-of-apps with sync waves could be preferable. It's almost identical overhead. Just instead of setting |
I think I agree with you here, I'll have to do some testing with sync waves. |
one question. I have case when i have a few applicationsets and they have a dependecies. Like before start deploying of main app we need to deploy a setup, which will create secrets, buckets, etc. Waves are not working there at all. Or I need to set a quite big waves in apps? |
In that case why is retry and eventual consistency not sufficient? If a secret is missing, kubernetes will not schedule your pod. Once it is there, it will be scheduled. This is how kubernetes is designed and you should preferably architect your solutions to take full advantage of this. Hard dependencies with wait times are not the optimal solution. |
The problkem that they running in parralel. All waves settings just not working. Even I setting on setup a wave: -1 and for app wave: 10 they still running at same time. |
I'd like to give my two cents.
I totally understand your position being the lead and steward of Argo CD, and I understand not wanting to add a bunch of ad-hoc features to the codebase. I don't think this is one of them though.
IMO I don't think "bad" suffices. Apart from what you have mentioned, there are now two ways of doing dependency management, none of which are compatible with each other depending on whether you use an ApplicationSet or a plain old Application. The way I like to think of ApplicationSets is that they are a superset of a normal application - everything that a normal Application can do an ApplicationSet should be able to do. As far as I know this is true for all Argo CD features - except for dependency management.
In addition to this, Progressive Syncs are currently an alpha-feature which means that this has to be enabled by an Argo CD admin in order for this feature to be used (not a major blocker but something to keep in mind). Progressive Syncs being alpha does give a bit of an opening though... for it to be replaced with a proper dependency management feature.
I think rather than having two features with bad UX, we would have one (or really two, since we can't get rid of sync waves) feature with a better UX. IMO Progressive Syncs should be removed in favor of this one. Sadly the ship has sailed on sync waves, but we could at least try to deprecate it and in a future far, far away we can maybe hope to remove it someday. |
@simonoff if the waves simply aren't being enforced, then I think we need a new issue to investigate that problem.
The cost of ordered dependencies is T_a + T_b, where those are the times of the prerequisite and of the dependent resources, respectively. The cost with retries is T_a + T_r + T_b, where T_r is the time spent after the prerequisite is satisfied waiting for the next retry to occur. If T_a is pretty short and if your retries are configured with a not-too-aggressive backoff, then the additional time cost shouldn't be high.
Sync waves and progressive syncs are both context-specific (the context being either a parent app or an AppSet). So I do see why having a unified, global ordering mechanism is desirable, especially if it can cover all the use cases of sync waves and possibly completely replace Progressive Syncs. But it's not obvious to me that we have to solve these limitations of the existing contextual ordering mechanisms in order to solve the concrete use cases described above (CRDs, webhooks, and finalizers). If there are other use cases which do require a global ordering mechanism, then those use cases need be described in detail so we can evaluate the And if it ultimately is solely about providing a better UX and not about solving some otherwise unsolved use case, then we have to weigh the costs and UX benefits of the new feature against the cost and UX benefits of improving the existing ones. I think we'd need a document that says:
|
Yeah, you covering "ideal" world where all app developed by standarts, best practices and others. But in real world is never 100%. |
I don't think I'm assuming developers adhere to any particular best practices. I'm just trying to understand "In what concrete scenarios do retries, sync waves, and progressive syncs all fail to solve the problem?" whether that scenario represents best practices or not. I could be missing something, but so far I haven't found a detailed articulation of a use case where a global ordering is necessary or even why one of the existing contextual ordering mechanisms is unacceptably difficult to use. I accept that there could be such use cases. But I think in order to justify a large and potentially-redundant feature, those use cases ought to be explained in some depth.
This is already possible. Both sync waves and progressive syncs will block on an app being out of sync (Kubernetes resource not applied) or some resource's field not being what it's expected to be (health check failed).
There are already CRDs that represent things like DNS records, Certificate requests, and S3 buckets. With a properly-configured health check Argo CD will block sync waves and progressive syncs waiting on those resources to resolve. Argo CD has built-in health checks for all three of these examples: DNS, Certs, and Buckets. |
Since they are talking about setting a large gap between sync waves I don't think they fully understand sync waves. Most likely their problem is that the resource is already sync'd and doesn't have a proper health check for their scenario so the next wave starts too soon for their use case. @simonoff Sync waves are an ordering only, there is no time difference depending on the numbers you select. Choosing
To be fair these are only a tiny number of potential methods for configuring these resources, the number of needed health checks is potentially near infinite. We for example use externaldns using ingress-shim. Ingress does not receive a status update from externaldns on whether or not DNS is configured, so I would have to potentially change the tools I use, if my use case requires that DNS is in place before a next wave starts. Now I don't personally agree that this is a way of working that should be supported in a Kubernetes ecosytem and is in my opinion antithesis to gitops and using ArgoCD in the first place. If you use tools like ArgoCD (and imo Kubernetes) then you should always strive for reconciliation towards desired state over configuration. I personally prefer that this is enforced as much as possible and am very much against dependency configuration. |
One of the biggest blockers for us using AppSets with Progressive Sync is the following: We currently use App of Apps to install cluster components. There are these dependencies between apps on each cluster, largely as articulated where we need CRDs installed and/or resources in earlier apps before we can sync later apps. This can be somewhat solved with sync waves, but there are UX issues with sync waves that could be tackled to make this better. Or better examples of how to tackle it being documented.
We want to move to AppSets so that we can deploy each cluster component as a unit using the AppSet.
We further had been waiting for ProgressiveSync so that we could order the deployments so that we're able to catch any issues in lower environments before we get to production. This I believe is possible now, but again feature is alpha and requires us to completely re-work our applications which is a lot to ask with no ability to map the dependency ordering within the cluster. |
Yep! The examples are just to show that these specific use cases aren't especially difficult to solve today. And of course, |
This is a common class of issue with app-of-apps + sync waves: the app health check doesn't handle X scenario. There's also reason to believe that the recommended default App health check should be improved. For your use case, I'd explore building a custom App health check that can detect the stuck DaemonSet and, if appropriate, ignore it. There may or may not be enough information on the Application status to make that determination, but I think it's worth looking into. As you noted, the problems mentioned are things that need to be improved in sync waves / progressive syncs. They're problems that would apply equally to a
Environment promotions in GitOps is a nascent space. I find Progressive Syncs unsatisfying for most promotion use cases, because it relies on intentional drift (apps remain out of sync with git until the promotion is complete). A lot of tools are emerging for promotion (Telefonistka, Kargo, Codefresh Products, gitops-promoter), and I think it'll be a while before we have a clear idea the best approach(es) to environment promotion in GitOps. If sync waves or progressive syncs work for some use cases, great! (We use progressive syncs to deploy the Argo CD metrics extension through environments at Intuit.) But we shouldn't consider it a core use case for those features yet. |
My problem with app-of-apps + sync waves for applications synchronization is more a UX problem: having a When I use sync wave, I write a comment above the annotation to explain why the resource (in this case, the I'm not sure a |
Just throwing an idea out here: What about a syncwave CRD you could supply with an array of app names? |
I'm thinking of a few edge cases. Suppose app B depends on app A. What if B's manifests get updated but A's not? Should we just sync B? What if A's manifests are updated but B's not. Should we re-sync B in case there are some non-deterministic side effects? What if users forget to do proper updates to manifests to honor dependencies? What if there's a race condition when manifests of A and B are updated not at exactly same time but when one of the syncs is in progress? |
Summary
I was speaking with @JasonMorgan from Buoyant today about a missing feature in Argo CD for blocking application syncs based on required dependencies on other applications. The use case is:
This is especially important for the bootstrapping use case where you're recreating a cluster from git, and you need to create many apps after a bunch of system-level add-ons are fully available. e.g. linkerd must be in place before any applications come up, because linkerd's mutating webhook needs to inject sidecars into application pods starting up.
The use case is very compelling and I'm convinced we should prioritize this. I think this feature, combined with ApplicationSets will really start to complete our bootstrapping story.
Motivation
Please give examples of your use case, e.g. when would you use this.
During cluster bootstrapping, cluster addons (especially ones with mutating webhooks) need to be in place before application pods can come up.
Proposal
How do you think this should be implemented?
It turns out, @jannfis already started some work on this, and the spec changes close to what we need: #3892
Given the age of the original PR, I'm filing an issue in case we abandon #3892 for a new attempt, and targeting this for tentative next milestone in case someone wants to pick this up.
The text was updated successfully, but these errors were encountered: