Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canceling deployment is superseded by new deployment #3072

Merged

Conversation

philippthun
Copy link
Member

@philippthun philippthun commented Nov 22, 2022

There are two active states for deployments (DEPLOYING, CANCELING) and two corresponding actions (DeploymentCreate, DeploymentCancel). This results in the following possible scenarios:

state + action = updated state (new deployment)
1 DEPLOYING create SUPERSEDED DEPLOYING
2 DEPLOYING cancel CANCELING -
3 CANCELING cancel CANCELING -
4 CANCELING create CANCELING DEPLOYING

Scenario 4 (CANCELING + create) currently leads to two active deployments for an app (CANCELING + DEPLOYING). As the Dispatcher and Updater (module DeploymentUpdater) are not designed to handle two active deployments concurrently, the result for this scenario has been changed:

state + action = updated state (new deployment)
4 CANCELING create SUPERSEDED DEPLOYING

This means that the create action now sets a CANCELING deployment to SUPERSEDED (CANCELED) and during the next run of the DeploymentUpdater, the new DEPLOYING deployment will be processed.

Furthermore the Updater (module DeploymentUpdater) now checks that the deployment being processed is still in the expected state.

Skip interim web processes from SUPERSEDED (CANCELED) deployments

When a deployment train is canceled, the latest interim web process is scaled up. The introduction of a SUPERSEDED (CANCELED) state requires a change to how this latest interim web process is determined.

Let's assume the following sequence of actions:

  • create
  • create
  • cancel
  • create
  • cancel

This would result in the following states (after each action):

  • DEPLOYING
  • SUPERSEDED (DEPLOYED) + DEPLOYING
  • SUPERSEDED (DEPLOYED) + CANCELING
  • SUPERSEDED (DEPLOYED) + SUPERSEDED (CANCELED) + DEPLOYING
  • SUPERSEDED (DEPLOYED) + SUPERSEDED (CANCELED) + CANCELING

And this should result in the web process from the SUPERSEDED (CANCELED) deployment to be skipped and the one from the SUPERSEDED (DEPLOYED) deployment to be scaled up.

Scale canceled web processes to zero

The introduction of a SUPERSEDED (CANCELED) state requires the Updater (module DeploymentUpdater) to handle the interim web process belonging to such a SUPERSEDED (CANCELED) deployment.

The cleanup of unused web processes happens at the end of a (successful or canceled) deployment. But as the canceled interim web process should not be used anymore, it is now immediately scaled to zero instances.

  • I have reviewed the contributing guide

  • I have viewed, signed, and submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using bundle exec rake

  • I have run CF Acceptance Tests

There are two active states for deployments (DEPLOYING, CANCELING) and
two corresponding actions (DeploymentCreate, DeploymentCancel). This
results in the following possible scenarios:

      |    state    +  action  =  updated state  /  (new deployment)
 ---------------------------------------------------------------------
   1  |  DEPLOYING  |  create  |    SUPERSEDED   |    DEPLOYING
   2  |  DEPLOYING  |  cancel  |    CANCELING    |    -
   3  |  CANCELING  |  cancel  |    CANCELING    |    -
   4  |  CANCELING  |  create  |    CANCELING    |    DEPLOYING

Scenario 4 (CANCELING + create) currently leads to two active
deployments for an app (CANCELING + DEPLOYING). As the Dispatcher and
Updater (module DeploymentUpdater) are not designed to handle two active
deployments concurrently, the result for this scenario has been changed:

      |    state    +  action  =  updated state  /  (new deployment)
 ---------------------------------------------------------------------
   4  |  CANCELING  |  create  |    SUPERSEDED   |    DEPLOYING

This means that the 'create' action now sets a CANCELING deployment to
SUPERSEDED (CANCELED) and during the next run of the DeploymentUpdater,
the new DEPLOYING deployment will be processed.

Furthermore the Updater (module DeploymentUpdater) now checks that the
deployment being processed is still in the expected state.
@beyhan
Copy link
Member

beyhan commented Nov 25, 2022

Hi @philippthun,

I was thinking whether our discussions in this Slack thread regarding how should a cancel-deployment behave could be considered with this pr. What I mean is to put also the new deployment from the table above to CANCELING state in scenario 2 and 3. WDYT? Or would this be a backwards incompatible change and we should better try to address this e.g. with a new CF CLI command like cancel-deployments?

When a deployment train is canceled, the latest interim web process is
scaled up. The introduction of a SUPERSEDED (CANCELED) state requires a
change to how this latest interim web process is determined.

Let's assume the following sequence of actions:
  create
  create
  cancel
  create
  cancel

This would result in the following states (after each action):
  DEPLOYING
  SUPERSEDED (DEPLOYED) + DEPLOYING
  SUPERSEDED (DEPLOYED) + CANCELING
  SUPERSEDED (DEPLOYED) + SUPERSEDED (CANCELED) + DEPLOYING
  SUPERSEDED (DEPLOYED) + SUPERSEDED (CANCELED) + CANCELING

And this should result in the web process from the SUPERSEDED (CANCELED)
deployment to be skipped and the one from the SUPERSEDED (DEPLOYED)
deployment to be scaled up.
The introduction of a SUPERSEDED (CANCELED) state requires the Updater
(module DeploymentUpdater) to handle the interim web process belonging
to such a SUPERSEDED (CANCELED) deployment.

The cleanup of unused web processes happens at the end of a (successful
or canceled) deployment. But as the canceled interim web process should
not be used anymore, it is now immediately scaled to zero instances.
@philippthun philippthun marked this pull request as ready for review November 29, 2022 14:57
@philippthun
Copy link
Member Author

Hi @philippthun,

I was thinking whether our discussions in this Slack thread regarding how should a cancel-deployment behave could be considered with this pr. What I mean is to put also the new deployment from the table above to CANCELING state in scenario 2 and 3. WDYT? Or would this be a backwards incompatible change and we should better try to address this e.g. with a new CF CLI command like cancel-deployments?

I would like to focus this PR on the problem described in issue #3019.

@philippthun
Copy link
Member Author

This is the output of cf app for an ongoing, "bad" 1 deployment with an interim deployment that got canceled, but superseded 2.

$ cf app my-app

Showing health and status for app my-app in org my-org / space my-space as my-user...

name:              my-app
requested state:   started
routes:            my-app.my-foundation.io
last uploaded:     Tue 29 Nov 10:20:05 UTC 2022
stack:
docker image:      cloudfoundry/test-app:latest

type:           web
sidecars:
instances:      2/2
memory usage:   1024M
     state     since                  cpu    memory        disk          logging              details
#0   running   2022-11-29T10:20:15Z   0.2%   14.8M of 1G   22.4M of 1G   33B/s of unlimited
#1   running   2022-11-29T10:20:26Z   0.1%   14.7M of 1G   22.4M of 1G   33B/s of unlimited

type:           web
sidecars:
instances:      0/0
memory usage:   1024M
There are no running instances of this process.

type:           web
sidecars:
instances:      0/1
memory usage:   1024M
     state     since                  cpu    memory    disk      logging      details
#0   crashed   2022-11-29T10:21:04Z   0.0%   0 of 1G   0 of 1G   0/s of 0/s

Footnotes

  1. environment variable too large, thus crashed

  2. cancel-deployment and restart executed without delay:

    # my-app running with two instances...
    
    cf set-env my-app too-large-env $(printf '%*s' 200000 | tr ' ' x)
    cf restart my-app --strategy rolling &
    sleep 10
    # deployment no. 1: DEPLOYING
    
    cf cancel-deployment my-app &
    # deployment no. 1: CANCELING
    
    cf restart my-app --strategy rolling &
    # deployment no. 1: SUPERSEDED (CANCELED)  /  deployment no. 2: DEPLOYING
    

@philippthun
Copy link
Member Author

This is the output of ...

Two additional notes to the comment above:

  1. The cancel-deployment command (running in background) fails with the following output:
Deployment has been superseded
FAILED
  1. When executing another cancel-deployment, the app is reverted back to its original state and the two additional web processes shown in the output above are cleaned up.

Copy link
Member

@moleske moleske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. Want to also say I appreciated the commit messages in the three commits. It made it easy to see what your flow was and easier to review the smaller pieces that make up the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rolling deployment: cancel+restart can lead to broken app
4 participants