[chore] Fix E2E autoscale test for OpenShift #1365

iblancasa · 2023-01-12T09:12:22Z

Signed-off-by: Israel Blancas [email protected]

Fixes #1364
Also, this PR makes the test more reliable.

frzifus

Until when do we support hpav1?

iblancasa · 2023-01-25T09:30:38Z

@iblancasa while we support K8s 1.23, if I'm not wrong

hack/wait-until-hpa-ready/main.go

pavolloffay · 2023-01-25T15:37:52Z

tests/e2e/autoscale/04-delete.yaml

@@ -0,0 +1,7 @@
+apiVersion: kuttl.dev/v1beta1


Why is this needed? The

opentelemetry-operator/tests/e2e/autoscale/01-install.yaml

Line 4 in 0232b80

name: tracegen

should be cleaned automatically.

The problem is: depending on the cluster where you are running the test, the previous duration parameter provided to tracegen can be not enough to trigger the HPA and scale the collector.

With this approach, we ran tracegen until the number of replicas is increased. Later, we stop tracegen to reduce the metrics and make the HPA to scale down.

tests/e2e/autoscale/01-check-simplest-collector-hpa.yaml

pavolloffay · 2023-01-25T15:41:12Z

From the issue

Later, it tries to match them with the CRs from tests/e2e/autoscale/00-assert.yaml. Those CRs check for HPAs whose apiVersion are autoscaling/v1 (simplest-collector) and autoscaling/v2beta2 (simplest-set-utilization-collector). In OpenShift (4.11) both are created with an apiVersion with the value autoscaling/v2

@iblancasa do we know the root cause why only autoscaling/v2 was created? The OCP 4.11 is based on k8s 1.24. autoscaling/v2beta1 was removed 1.25

iblancasa · 2023-01-25T16:39:15Z

@pavolloffay

From the issue

Later, it tries to match them with the CRs from tests/e2e/autoscale/00-assert.yaml. Those CRs check for HPAs whose apiVersion are autoscaling/v1 (simplest-collector) and autoscaling/v2beta2 (simplest-set-utilization-collector). In OpenShift (4.11) both are created with an apiVersion with the value autoscaling/v2

@iblancasa do we know the root cause why only autoscaling/v2 was created? The OCP 4.11 is based on k8s 1.24. autoscaling/v2beta1 was removed 1.25

I didn't research too much but I think it can be related to this comment in 00-install.yaml:

# TODO: these tests use .Spec.MaxReplicas and .Spec.MinReplicas. These fields are
# deprecated and moved to .Spec.Autoscaler. Fine to use these fields to test that old CRD is
# still supported but should eventually be updated.

If you want, I can create a new issue for that.

pavolloffay · 2023-01-26T15:53:45Z

tests/e2e/autoscale/wait-until-hpa-ready.go

+	"k8s.io/client-go/util/homedir"
+)
+
+func main() {


please add \n as a last character to all Printf statements

pavolloffay · 2023-01-26T16:06:10Z

tests/e2e/autoscale/wait-until-hpa-ready.go

+	pollInterval := time.Second
+
+	// Search in v2 and v1 for an HPA with the given name
+	err = wait.Poll(pollInterval, 0, func() (done bool, err error) {


I am still not sure why actually v1 and v2 HPAs are created in a test.

My understanding is that only a single HPA version should be used in a given cluster. Could you please explain why both are created?

I didn't write the test. I'm just trying to make it work on OpenShift. But this is what I found while checking for the purpose of this E2E test:

The test creates 2 OpenTelemetryCollector instances to test 2 ways of creating HPAs. From the 00-install.yaml file:

# This creates two different deployments: # * The first one will be used to see if we scale properly # * The second is to check the targetCPUUtilization option

This creates 2 HPAs (I wait for their creation and metrics reporting in steps 1 and 2)

We start tracegen in step 3 and wait for one of the OpenTelemetryCollector instances to scale up to 2 replicas

We remove the tracegen deployment to stop reporting traces in step 4

Wait until the OpenTelemetryCollector scales down in step 5

When the HPAs are created, they will be created using autoscaling/v1 or autoscaling/v2beta2. If you check how this test was written before my changes, you can see how in 00-assert.yaml , the test tries to assert the simplest-collector HPA with autoscaling/v1 and the simplest-set-utilization-collector HPA with autoscaling/v2beta2.

When I ran this in OpenShift 4.11, both of them were created using autoscaling/v2beta2 (as I pointed in this comment).

So, since in KUTTL there is no way to conditionally check for one resource or another, I created the wait-until-hpa-ready.go script to (given a name) dynamically:

Look for an HPA in the autoscaling/v2beta2 API. If found, check if the HPA status is different from unknown

If the HPA was not found in autoscaling/v2beta2, look for it in autoscaling/v1. If found, check if the HPA status is different from unknown

Another thing: why the HPAs are created using different autoscaling API versions (as we can see in 00-assert.yaml) in the Kubernetes versions tested during the CI? I think this is because one is setting the .spec.minReplicas and .spec.maxReplicas values and the other is setting them in .spec.autoscaler (as the comment in 00-install.yaml explains). If you want, I can do a deeper investigation about why this happens, but in a separate issue since is not related to the current PR.

pavolloffay · 2023-01-30T18:47:30Z

@iblancasa could you please fix the CI?

iblancasa · 2023-01-31T09:38:53Z

I broke something in my branch. Fixing...

Signed-off-by: Israel Blancas <[email protected]>

iblancasa · 2023-01-31T12:04:47Z

Fixed!

Signed-off-by: Israel Blancas <[email protected]>

pavolloffay

Let's merge this and book a ticket to make sure only a single HPA version is used by the operator for a given k8s version.

* Improve the reliability of the autoscale E2E test Signed-off-by: Israel Blancas <[email protected]> * Revert change Signed-off-by: Israel Blancas <[email protected]> --------- Signed-off-by: Israel Blancas <[email protected]>

iblancasa requested a review from a team January 12, 2023 09:12

frzifus added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Jan 12, 2023

frzifus approved these changes Jan 25, 2023

View reviewed changes

pavolloffay reviewed Jan 25, 2023

View reviewed changes

pavolloffay reviewed Jan 26, 2023

View reviewed changes

iblancasa requested a review from pavolloffay January 26, 2023 18:06

iblancasa requested a review from a team January 31, 2023 09:37

iblancasa marked this pull request as draft January 31, 2023 09:39

Improve the reliability of the autoscale E2E test

b9e336a

Signed-off-by: Israel Blancas <[email protected]>

iblancasa force-pushed the fix/1364 branch from 753193b to b9e336a Compare January 31, 2023 12:04

iblancasa marked this pull request as ready for review January 31, 2023 12:04

Revert change

0cde3ad

Signed-off-by: Israel Blancas <[email protected]>

pavolloffay approved these changes Jan 31, 2023

View reviewed changes

pavolloffay merged commit 55c37bc into open-telemetry:main Jan 31, 2023

pavolloffay mentioned this pull request Jan 31, 2023

Make sure only a single HPA version is used for a given Kubernetes version #1416

Closed

iblancasa deleted the fix/1364 branch January 31, 2023 13:06

This was referenced Feb 2, 2023

REQUEST: New membership for @iblancasa open-telemetry/community#1352

Closed

REQUEST: New membership for @iblancasa open-telemetry/community#1358

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] Fix E2E autoscale test for OpenShift #1365

[chore] Fix E2E autoscale test for OpenShift #1365

iblancasa commented Jan 12, 2023

frzifus left a comment

iblancasa commented Jan 25, 2023

pavolloffay Jan 25, 2023

iblancasa Jan 25, 2023

pavolloffay commented Jan 25, 2023

iblancasa commented Jan 25, 2023 •

edited

Loading

pavolloffay Jan 26, 2023

pavolloffay Jan 26, 2023

iblancasa Jan 26, 2023

pavolloffay commented Jan 30, 2023

iblancasa commented Jan 31, 2023

iblancasa commented Jan 31, 2023

pavolloffay left a comment

[chore] Fix E2E autoscale test for OpenShift #1365

[chore] Fix E2E autoscale test for OpenShift #1365

Conversation

iblancasa commented Jan 12, 2023

frzifus left a comment

Choose a reason for hiding this comment

iblancasa commented Jan 25, 2023

pavolloffay Jan 25, 2023

Choose a reason for hiding this comment

iblancasa Jan 25, 2023

Choose a reason for hiding this comment

pavolloffay commented Jan 25, 2023

iblancasa commented Jan 25, 2023 • edited Loading

pavolloffay Jan 26, 2023

Choose a reason for hiding this comment

pavolloffay Jan 26, 2023

Choose a reason for hiding this comment

iblancasa Jan 26, 2023

Choose a reason for hiding this comment

pavolloffay commented Jan 30, 2023

iblancasa commented Jan 31, 2023

iblancasa commented Jan 31, 2023

pavolloffay left a comment

Choose a reason for hiding this comment

iblancasa commented Jan 25, 2023 •

edited

Loading