Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiting for interdependent resources to become ready #427

Closed
scothis opened this issue Jan 27, 2022 · 10 comments · Fixed by #499
Closed

Waiting for interdependent resources to become ready #427

scothis opened this issue Jan 27, 2022 · 10 comments · Fixed by #499
Assignees
Labels
carvel accepted This issue should be considered for future work and that the triage process has been completed discussion This issue is not a bug or feature and a conversation is needed to find an appropriate resolution enhancement This issue is a feature request

Comments

@scothis
Copy link
Contributor

scothis commented Jan 27, 2022

Describe the problem/challenge you have

I have two resources that are part of an app. Resource A will not be ready without resource B, and resource B will not become ready without resource A. I'd like to know for kapp deploy when both resources are ready.

If I define no wait rules or change rules then everything will come up, but I don't know when that happens via kapp.

If I define wait rules, then I know when both resources are up, but there's a race condition where one of the resource could be not ready and fail the kapp deploy.

If I define a change rule, then I can ensure that resource A is created before resource B. But because resource A will never become ready without resource B, I can't also specify a wait rule, as resource B will never be created as resource A is never ready.

More specific details below.

Describe the solution you'd like

I'd like a way to order the upsert of resources via a change rule to wait for the resource to be "seen" without needing to wait for all of the wait rule conditions. Waiting for the resource to be created/updated on the api may be sufficient, or if a wait rule defines supportsObservedGeneration: true wait for the generations to match.

Anything else you would like to add:

Slack thread

My specific use case involves trying to deploy a Knative Service (S) and a ServiceBinding (SB) resource. The S will crash loop until the SB is ready. The SB will not become ready until the S exists.

Prereqs:

  1. Install Knative Serving
  2. Install Service Bindings
    kapp deploy -a service-bindings -f https://github.com/vmware-labs/service-bindings/releases/download/v0.6.0/service-bindings-0.6.0.yaml
    
  3. Create a MySQL service
    kapp deploy -a mysql -f https://gist.github.com/scothis/f65bd3ec45bd42ca89b1a554442c7cdc/raw/467d84237f582c6615ae1f7543d30d3a5015f2c6/mysql.yaml
    

Gist with config I'm attempting to use. Has both wait rules and a change rule defined. It will fail as is.

kapp deploy -a workload -f https://gist.github.com/scothis/f65bd3ec45bd42ca89b1a554442c7cdc/raw/467d84237f582c6615ae1f7543d30d3a5015f2c6/app.yaml

Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

@scothis scothis added carvel triage This issue has not yet been reviewed for validity enhancement This issue is a feature request labels Jan 27, 2022
@renuy renuy assigned renuy and praveenrewar and unassigned renuy Jan 28, 2022
@praveenrewar
Copy link
Member

My specific use case involves trying to deploy a Knative Service (S) and a ServiceBinding (SB) resource. The S will crash loop until the SB is ready. The SB will not become ready until the S exists.

So we would need a different version of change group and change rules, and rather than waiting for the dependee to be ready, we just wait for it to exists.
I will think more about it while going through the examples and gist in detail.

@scothis
Copy link
Contributor Author

scothis commented Jan 28, 2022

exists can get a bit sticky, especially for updates. I was thinking something like seen might be better, so if a resource has supportsObservedGeneration set to true, then we can wait for the observed generation to match the generation. That we we can know the resource has at least been reconciled even if the resource is not ready.

It might also be useful to have the ability to order the condition checks, so even if there's a wait rule for a resource, the check could be deferred until another resource is ready.

Is "ready" the right term within kapp?

@cppforlife
Copy link
Contributor

i see this as whether waiting phase is blocking (currently) vs non-blocking (wanted in this case).

Waiting for the resource to be created/updated on the api may be sufficient, or if a wait rule defines supportsObservedGeneration: true wait for the generations to match.

this is an interesting bit. in your example is it required for "The SB will not become ready until the S exists." to work out. im trying to think of examples where "partial waiting" would be necessary (which does shift idea of blocking-vs-non-blocking a bit).

@praveenrewar
Copy link
Member

praveenrewar commented Jan 30, 2022

@scothis I followed the steps mentioned by you to reproduce the issue and I noticed that an external Service and an Ingress are created once the S and SB are in the desired state. So, I was wondering if you would be able to use the exists annotation on the external Service and wait for it's existence instead of waiting for S and SB. I did try this out and it seems to be working but I am not sure if the creation of the external Service is something that you would be able to always rely on.
Gist that I used for the deployment.

@renuy renuy added the discussion This issue is not a bug or feature and a conversation is needed to find an appropriate resolution label Jan 31, 2022
@scothis
Copy link
Contributor Author

scothis commented Feb 2, 2022

@praveenrewar I haven't used the exists annotation before, but I don't see how that would solve the race condition that exists between the S and SB resources becoming ready. The existence of a child resource it not an indication that the parent resource is completely ready, but is certainly part of that resource becoming ready.

@praveenrewar
Copy link
Member

but I don't see how that would solve the race condition that exists between the S and SB resources becoming ready

Does the race condition occur even when S and SB are deployed together (without any change rules)?

I am exploring the non-blocking behaviour suggested by @cppforlife, and I saw this error in the SB description

Warning UpdateFailed 34s (x14 over 75s) servicebinding-controller Failed to update status for "spring-petclinic-db": admission webhook "defaulting.webhook.bindings.labs.vmware.com" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "subresource"

I will take a look at it again in some time but let me know if you have seen it before and able to resolve it.

@scothis
Copy link
Contributor Author

scothis commented Feb 3, 2022

Does the race condition occur even when S and SB are deployed together (without any change rules)?

The race surfaces when defining wait rules for either resource, as either resource may flip to Ready=False before the system stabilizes on Ready=True. You should be able to simulate this by removing either resource from the app being deployed.

I saw this error in the SB description

I haven't seen this particular error before. Do you have any other context for how it surfaced?

@praveenrewar
Copy link
Member

The race surfaces when defining wait rules for either resource, as either resource may flip to Ready=False before the system stabilizes on Ready=True. You should be able to simulate this by removing either resource from the app being deployed.

I see. In my case, the SB is not getting updated, so I am not able to simulate the exact behaviour, but if try to deploy the S alone, it does indeed wait indefinitely for the SB.

Do you have any other context for how it surfaced?

I actually followed the steps mentioned by you on a minikube cluster. I will try to deploy it on a fresh cluster again and see if the problem persists.

@praveenrewar
Copy link
Member

Warning UpdateFailed 34s (x14 over 75s) servicebinding-controller Failed to update status for "spring-petclinic-db": admission webhook "defaulting.webhook.bindings.labs.vmware.com" denied the request: mutation failed: cannot decode incoming new object: json: unknown field "subresource"

I was able to get rid of this ^ issue on a gke cluster.

I have one question related to the ordering of (S) and (SB) though.

My specific use case involves trying to deploy a Knative Service (S) and a ServiceBinding (SB) resource. The S will crash loop until the SB is ready. The SB will not become ready until the S exists.

Based on this, my assumption was that we would want to create the (S) first, wait for it to be seen, create the (SB) and then wait for both the resources to be ready. But based on the change rule defined defined in the gist, we are trying to create the (SB) first, which I think would lead to an immediate failure.

Is it just that the you were experimenting with the change rules and it's not exactly how you want it to be, or is there something that I have missed.

@praveenrewar praveenrewar linked a pull request Mar 9, 2022 that will close this issue
@praveenrewar praveenrewar added carvel accepted This issue should be considered for future work and that the triage process has been completed and removed carvel triage This issue has not yet been reviewed for validity labels Mar 16, 2022
@scothis
Copy link
Contributor Author

scothis commented Apr 8, 2022

Proposing the addition of a new condition for the ServiceBinding resource in servicebinding/spec#218

If we can wait for this status condition to become true in a change-group, we can side step the race condition between resource for their health wait rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
carvel accepted This issue should be considered for future work and that the triage process has been completed discussion This issue is not a bug or feature and a conversation is needed to find an appropriate resolution enhancement This issue is a feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants