-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update autoscaler max nodes test #241
update autoscaler max nodes test #241
Conversation
openshift/kubernetes-autoscaler#241 will want to know when this merges also, i am amused that these both have the same pull number |
/test e2e-aws-operator |
/retest-required |
5c4a785
to
3fc6fc2
Compare
updated to add a filter for unschedulable nodes as well |
looking at the scale to/from zero tests, it appears that sometimes the autoscaler does not think it should remove the nodes even after the test workload has been deleted. my current hypothesis is that /something/ is being scheduled to those nodes (could be a core operator or similar) that is causing the autoscaler to not delete them. if we look at successful runs of this test we see this in the output:
this output shows that the autoscaler emitted events to remove all the nodes. but, if we look at a failure, we see this:
we can see that the autoscaler only considers removing a single node and also it thinks that these other pods should be scheduled somewhere. i'm not sure if those other pods are causing part of the issue as they might schedule onto the new machineset and prevent a scale down. we have had similar bugs related to OLM pods, but i had thought we solved those issues. it also seems like this problem is intermittent as the test will pass sometimes. |
/retest-required |
/retest |
4 similar comments
/retest |
/retest |
/retest |
/retest |
i'm starting to think we need to bump the k8s version here as well. seeing a bunch of failures around PDBs having the wrong version. |
updated the deps for k8s 1.25 and MAO/CAO, let's see how this works |
/retest |
1 similar comment
/retest |
31a07e5
to
62b270e
Compare
i would like to catch this in the vendor update, openshift/cluster-autoscaler-operator#252, so i am putting a hold here temporarily. /hold |
62b270e
to
dc312c4
Compare
updated with rebase of cao |
/retest |
1 similar comment
/retest |
/test e2e-aws-operator |
/retest |
2 similar comments
/retest |
/retest |
/retest-required |
4 similar comments
/retest-required |
/retest-required |
/retest-required |
/retest-required |
that last couple failures on e2e-aws-operator appear to have been because the autoscaler scale to zero test was waiting for a node to finish scaling in, but i never see a corresponding event from the autoscaler to signal that it will remove the node. this makes me think that either something is on that node that is preventing it from scaling down or the workload hasn't finished yet (which would be odd). |
i'm hacking on a small change to enforce a taint on the machinesets we create that the test workload can tolerate. i have it working locally but need to add a few more tests. |
/hold |
2bb6213
to
194d804
Compare
not sure what happened here, i tried to rebase my changes and now it's all messed up |
i see what i did, i'll fix it shortly |
this change makes it so that the max nodes test will observe only the nodes that are ready and schedulable in the cluster when determining if the maximum has been reached. the autoscaler will only count ready and schedulable nodes when performing its calculations, when this is combined with tests that might not clean up the cluster properly, or may leave nodes in a non schedulable state, then the test need to take into account only the ready nodes when calculating the maximum size.
this change makes the job names reflect the test they are running and also make them unique to each test for the autoscaler. although these workloads should not conflict with each other, they are all created in the same namespace with the same name, this change will make them more unique.
this change adds taints to the machinesets that are used by the autoscaler tests to ensure that no other workloads are landing on the new machines that are created.
194d804
to
802d88a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The taints and tolerations prevent other workloads running on our hosts, but they don't guarantee that our workloads run on our hosts.
We should add a specific label to the Machine template (so it ends up on the node) and then add a nodeSelector
to the workloads pods so that they only schedule on nodes with that label
@@ -41,6 +41,10 @@ func NewWorkLoad(njobs int32, memoryRequest resource.Quantity, workloadJobName s | |||
Key: "kubemark", | |||
Operator: corev1.TolerationOpExists, | |||
}, | |||
{ | |||
Key: ClusterAPIActuatorPkgTaint, | |||
Effect: corev1.TaintEffectPreferNoSchedule, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want a NoSchedule rather than prefer, prefer means we might still get random workloads on here right? You said you had some issues with this IIRC? Maybe you can expand on those here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried NoSchedule at it did not work as i expected, which is why i backed off the PreferredNoSchedule. i have a feeling something else is not properly tolerating NoSchedule and causing the nodes to get deprovisioned. we should figure it out, but this current patch is an incremental improvement imo.
@elmiko: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
we are doing that already, unless i am misunderstanding https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/framework/jobs.go#L52 |
/hold cancel |
Positional arguments are hard! /lgtm |
this change makes it so that the max nodes test will observe only the ready nodes in the cluster when determining if the maximum has been reached. the autoscaler will only count ready nodes when performing its calculations, when this is combined with tests that might not clean up the cluster properly, or may leave nodes in a non schedulable state, then the test need to take into account only the ready nodes when calculating the maximum size.
this PR also changes the workload job names that are run in the autoscaler tests to ensure that they are unique from each other.