-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce NodeDeleterBatcher to ScaleDown actuator #5060
Introduce NodeDeleterBatcher to ScaleDown actuator #5060
Conversation
f1bed8d
to
3ae4ab9
Compare
cluster-autoscaler/core/scaledown/deletionbatcher/node_deletion_batcher.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/processors/status/scale_down_status_processor.go
Outdated
Show resolved
Hide resolved
54efed5
to
ebca4f9
Compare
/hold |
ebca4f9
to
561658d
Compare
/unhold I updated the code a bit:
|
561658d
to
747e8a3
Compare
RecordFailedScaleDownEvent(node, drain, a.ctx.Recorder, "prepareNodeForDelition failed", status.Err) | ||
_, _ = deletetaint.CleanToBeDeleted(node, a.ctx.ClientSet, a.ctx.CordonNodeBeforeTerminate) | ||
nodeGroup, err := a.ctx.CloudProvider.NodeGroupForNode(node) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about just passing the node group id to scheduleDeletion
and prepareForDeletion
? We have the id ready whenever we're calling scheduleDeletion
anyway. This would mean we wouldn't have to call NodeGroupForNode
here and in prepareForDeletion
separately, and we would be able to handle the error better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we will call NodeGroupForNode() when actual node deletion happens. So if NodeGroupForNode() return nil I think we may have an error during deletion and probably inside batch as well.
For scheduleDeletion() I would pass nodeGroupId, but for prepareNodeForDeletion we don't need nodeGroupId, so I would leave it as it is to have a middle check for NodeGroupForNode().
toBeDeletedTaint := apiv1.Taint{Key: deletetaint.ToBeDeletedTaint, Effect: apiv1.TaintEffectNoSchedule} | ||
testNg := testprovider.NewTestNodeGroup("test-ng", 0, 100, 3, true, false, "n1-standard-2", nil, nil) | ||
emptyNodeNodeGroup := generateNodesAndNodeGroupMap(4, "empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it important to switch from 1 node group for all nodes to 1 node group per each node? Just to test a more comprehensive scenario as node groups actually matter now, or is there another reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have one error message for all nodes from node group when there is failed deletion scenario for scale down with batcher. Also, if one deletion failed in one node group it may result of failed deletion for another node group.
However, here we have batch with 0 second batching interval, so the nodes should be deleted one by one. But this is not always true, since we lock batcher in addNodeToBucket() and remove(), so actually we may add few nodes before remove() call.
And this is what happening time to time, the current implementation make the test flaky. I fixed the test, to not be flaky, but now I'm not sure if it's the right approach. I would like to have an old behaviour (i.e. no batching) when the batch interval is 0 seconds.
break | ||
} | ||
} | ||
if diff := cmp.Diff(test.wantSuccessfulDeletion, gotDeletedNodes); diff != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this test verify that the test above doesn't? I'd imagine we want to test if the nodes actually get batched, but this test doesn't verify that at all, since the cloud provider hook processes nodes one-by-one. I'd try to assert if all nodes from one wave get deleted in the same API call. That will probably require some fiddling around with the cloud provider, but IMO it's needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The waves is show the set of nodes for which one we call actuator.StartDeletion, but only nodes that belong to one nodeGroup are deleted in one API call.
The previous test is testing batcher with 0 seconds interval and this test is testing batcher with >0 seconds interval. So this test is testing the e2e deletion. The tests in Batches are aimed to test the correctness of batching behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole reason we're doing batching is to delete multiple nodes in one API call. This test doesn't verify that, it just verifies that all nodes do get deleted. It would still pass without the batcher change. To test this properly, you'd have to modify the test cloud provider (or create a custom one), that would hook either on GkeMig.DeleteNodes(nodes []*apiv1.Node)
, or gkeManager.DeleteInstances(instances []gce.GceRef)
, which would allow us to verify if the nodes got deleted in 1 call. It's a big change though and this review has already gone for long enough, so let's discuss it in a follow-up.
cluster-autoscaler/core/scaledown/actuation/delete_in_batch_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/core/scaledown/actuation/delete_in_batch_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/core/scaledown/actuation/delete_in_batch_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/core/scaledown/actuation/delete_in_batch_test.go
Outdated
Show resolved
Hide resolved
|
||
func TestCleanUp(t *testing.T) { | ||
provider := testprovider.NewTestCloudProvider(nil, func(id, nodeName string) error { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also want to verify that the nodes actually get deleted, and in batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added gotDeletedNodes slice
} | ||
batches := len(d.deletionsPerNodeGroup) | ||
|
||
// CleanUp NodeGroup that is not present in bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this a table-style test as well? I'd like to see a test case where we do addNode, addNode, remove, addNode, remove for the same node group in quick succession (or maybe all but the last remove started at the same time as goroutines, and then a synchronous remove)
. Then assert that all nodes get deleted, and that the state is as expected at the end (empty deletionsPerNodeGroup
, empty drainedNodeDeletions
). I know it should work with the current implementation, but I can see somebody changing something in the future that would make us skip some node in this scenario, or leak memory in the structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test now is table-style. This is unit test, that verify remove() call only. The whole batch, i.e AddNode() should be tested in actuator_test, that test the whole functionality.
fec0a90
to
d5fd28c
Compare
} | ||
}(drainNode) | ||
nodeGroup, err := a.ctx.CloudProvider.NodeGroupForNode(drainNode) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely check nodeGroup
for nil here, if we're calling .Id()
on it right after this, otherwise we're risking panic. And if that check is here already, I suppose it serves no additional purpose in prepareNodeForDeletion
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
break | ||
} | ||
} | ||
if diff := cmp.Diff(test.wantSuccessfulDeletion, gotDeletedNodes); diff != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole reason we're doing batching is to delete multiple nodes in one API call. This test doesn't verify that, it just verifies that all nodes do get deleted. It would still pass without the batcher change. To test this properly, you'd have to modify the test cloud provider (or create a custom one), that would hook either on GkeMig.DeleteNodes(nodes []*apiv1.Node)
, or gkeManager.DeleteInstances(instances []gce.GceRef)
, which would allow us to verify if the nodes got deleted in 1 call. It's a big change though and this review has already gone for long enough, so let's discuss it in a follow-up.
cluster-autoscaler/core/scaledown/actuation/delete_in_batch_test.go
Outdated
Show resolved
Hide resolved
cluster-autoscaler/core/scaledown/actuation/delete_in_batch_test.go
Outdated
Show resolved
Hide resolved
t.Errorf("%s: remove() return error, but shouldn't", test.name) | ||
} | ||
gotDeletedNodes := []string{} | ||
for i := 0; i < test.numNodes; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this loop also select for nodDeletedNodes
? E.g. in the Unsuccessful remove
case test.NumNodes=5
, but deletedNodes
will only receive 4 confirmations - since 1 node is notifying a different channel. Wouldn't this run into the timeout case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the Unsuccessful remove case hasn't run because of return statement, now it fixed.
If we have failed deletion the following nodes are not deleted, so I check this deleted nodes only in success case and otherwise I check at least one failure.
d5fd28c
to
65b0d78
Compare
/lgtm Thanks for all the changes!! Just one more nit, feel free to unhold if you don't agree or prefer to address it in a follow-up. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: towca, yaroslava-serdiuk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
…-batch Introduce NodeDeleterBatcher to ScaleDown actuator
Which component this PR applies to?
/cluster-autoscaler
What type of PR is this?
/kind feature
What this PR does / why we need it:
Optimize Scale down by deleting nodes in batch. The --node-deletion-in-batch-interval represent the time that NodeDeleterBatcher will gather nodes for one node group and after that time will call nodeGroup.DeleteNodes(nodes). If the flag is unset, the NodeDeleterBatcher won't wait for other nodes and immediately start node deletion.
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: