-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling deployment: cancel+restart can lead to broken app #3019
Comments
While the use case of a "deployment train" (i.e. having multiple running deployments in parallel) is totally valid, allowing a From my point of view we should simply disallow this, i.e. in case there is a This is changed in PR #3026. |
@Gerg - Would this (comment above) make sense from your point of view? Or do you think that developers expect that a |
I'll try to spend some time thinking about what the desired behavior should be here. Ideally, I'd like to be in a place where |
This feels like a bug to me:
Would this be resolved by also checking that the prior web process is not newer than the deploying web process? Something like: prior_web_process = web_processes.
reject { |p| p.guid == deploying_web_process.guid }.
reject { |p| p.created_at > deploying_web_process.created_at }.
max_by(&:created_at) For context, here is the story that initially implemented this behavior: https://www.pivotaltracker.com/story/show/160208638 |
Fixing the logic to get the prior web process is one thing, the other would be to keep the newer web processes - currently everything except the prior one is destroyed. But what should happen when both 'active' (canceling + deploying) deployments are processed at the same time - i.e. in a single run of the What about the following idea: if there are two 'active' (canceling + deploying) deployments for the same app being processed at the same time (single run of the |
Draft PR: #3072 |
Issue
If a failing rolling deployment is canceled and immediately a new rolling deployment is triggered (see example below), then wrong application web processes are terminated which leads to a broken/unavailable app even though it was healthy before.
Rolling deployments should not lead to an unavailable application in case the new app version can't be started.
Context
cf-deployment v21.11.0
capi-release
This is probably also the root cause for cli #2257.
Steps to Reproduce
Current result
App is unavailable because the healthy web process from step1 was stopped.
By analysing CC logs (of api and scheduler VMs) and reading the CC coding:
Expected result
App remains available even though all rolling deployment attempts fail.
Possible Fix
Just ideas, need more discussion:
cf restart testapp --strategy rolling
of step 2 should fail fast without creating deployment2 since deployment1 is still in progress (CANCELING)The text was updated successfully, but these errors were encountered: