Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD RollingUpdate support to only add new node after one old node deleted completely #3417

Closed
jzhoucliqr opened this issue Jul 29, 2020 · 7 comments · Fixed by #3434
Closed
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@jzhoucliqr
Copy link
Contributor

User Story

As a operator I would like to support scale in for the MachineDeployment, to only launch new nodes after deleting existing nodes, due to resource constraints or other stateful resources (for eg disks) on the workers.

Detailed Description

The use case is to support hyper converged storage solutions like OpenEBS and Portworx. With disk autoprovision, Portworx will try to use existing disk to attach to worker nodes. If no existing disk found, will create new ones.

Then for upgrade, the behavior I'm trying to achieve is, only until one old node completely deleted then start create a new node, so the existing disk can attach to the new node.

I can achieve partially by setting maxSurge=0 and maxUnavailable=1. The issue is when MachineSet calculate replica count, it does not contain Machines in deleting state, so new node is launched before old node is deleted, even through maxsurge=0.

Anything else you would like to add:

I think there are two ways to achieve this:

  1. Add a new strategy, like RollingRecreate
  2. Update RollingUpdate strategy to add a new flag, IncludeDeletingReplicas, when this flag presents, MachineSet controller takes into consideration of deleting Machines for replica count, to make sure total machines will not exceed maxSurge.

Personally I prefer solution 2. Let me know if this makes sense.

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 29, 2020
@jzhoucliqr jzhoucliqr changed the title MD RollingUpdate strategy support to only add new node after one old node deleted completely MD RollingUpdate support to only add new node after one old node deleted completely Jul 29, 2020
@detiber
Copy link
Member

detiber commented Jul 29, 2020

I'm a +1 to the logic in 2, but I think I would like to argue that the current behavior should be considered a bug and we should fix it without introducing a new flag for the appropriate behavior.

@vincepri
Copy link
Member

/kind bug
/milestone v0.4.0
/help

@k8s-ci-robot
Copy link
Contributor

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/kind bug
/milestone v0.4.0
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added this to the v0.4.0 milestone Jul 29, 2020
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jul 29, 2020
@jzhoucliqr
Copy link
Contributor Author

Thanks. Will have a PR for the fix.

@jzhoucliqr
Copy link
Contributor Author

/assign

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2020
@fabriziopandini
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants