Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find current Kubernetes versions for e2e testing #2388

Merged
merged 1 commit into from
Jun 23, 2022

Conversation

mboersma
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

After #2302 merged with related test-infra changes, our e2e code did not catch up. This change does that, so now e2e tests using these SKU-related helper functions will find newer Kubernetes versions published with the new SKU/version naming scheme.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 15, 2022
@mboersma mboersma added this to the v1.4 milestone Jun 15, 2022
@mboersma
Copy link
Contributor Author

/milestone v1.4

@mboersma
Copy link
Contributor Author

/retitle [WIP] Find current Kubernetes versions for e2e testing

@k8s-ci-robot k8s-ci-robot changed the title Find current Kubernetes versions for e2e testing [WIP] Find current Kubernetes versions for e2e testing Jun 15, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2022
@mboersma mboersma force-pushed the latest-k8s-in-e2e branch from 19fc9c6 to 31a7875 Compare June 15, 2022 21:52
@mboersma
Copy link
Contributor Author

/retest

@mboersma
Copy link
Contributor Author

This seems to have uncovered a bug or publishing gap:

E0615 22:21:45.271314 1 controller.go:317] controller/azuremachine "msg"="Reconciler error" "error"="failed to init machine scope cache: failed to get default image: no VM image found for publisher \"cncf-upstream\" offer \"capi-windows\" sku \"windows-2019-gen1\" with Kubernetes version \"v1.22.10\"" "name"="capz-e2e-jtwqbu-ha-md-win-pqv8x" "namespace"="capz-e2e-jtwqbu" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AzureMachine"

I'll investigate.

@mboersma
Copy link
Contributor Author

Looks like we didn't publish v1.22.10 for Windows-2019 (non-contained). I'm backfilling that now, and there should be new patch releases today as well so I'll try to publish all that together.

@CecileRobertMichon
Copy link
Contributor

How come the tests tried to use it if it wasn't published? Is it because it found the containerd image?

@mboersma
Copy link
Contributor Author

Yes, because it found v1.22.10 as a containerd image, but this code only returns the version string. Later code assumed that this would be available in non-contained as well, which it should have been.

@CecileRobertMichon
Copy link
Contributor

I thought we weren't testing dockershim anymore since #2292 ? cc @jsturtevant

Shouldn't these tests be using the containerd image?

@jsturtevant
Copy link
Contributor

Looks like we looked for containerd but ended up looking for the sku without containerd:

e2e-zrcjxv" "reconciler group"="infrastructure.cluster.x-k8s.io" "reconciler kind"="AzureMachine" "x-ms-correlation-request-id"="db0b16c6-78c3-45a8-a6c0-437506710e51"
 "machine"="capz-e2e-zrcjxv-ha-md-win-hr276" "runtime"="containerd" "windowsServerVersion"=""

I0616 03:06:07.569791       1 cache.go:117] controller/azuremachine/virtualmachineimages.Cache.Get
 "msg"="VM images cache miss" "name"="capz-e2e-zrcjxv-ha-md-win-hr276" "namespace"="capz-e2e-zrcjxv" "reconciler group"="infrastructure.cluster.x-k8s.io"
 "reconciler kind"="AzureMachine" "x-ms-correlation-request-id"="db0b16c6-78c3-45a8-a6c0-437506710e51" 
"location"="uksouth" "offer"="capi-windows" "publisher"="cncf-upstream" "sku"="windows-2019-gen1"

@mboersma mboersma force-pushed the latest-k8s-in-e2e branch from 31a7875 to f60a9d2 Compare June 16, 2022 20:01
@mboersma mboersma force-pushed the latest-k8s-in-e2e branch from f60a9d2 to 95ad291 Compare June 16, 2022 21:59
@mboersma
Copy link
Contributor Author

/retitle Find current Kubernetes versions for e2e testing

@k8s-ci-robot k8s-ci-robot changed the title [WIP] Find current Kubernetes versions for e2e testing Find current Kubernetes versions for e2e testing Jun 16, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2022
@mboersma
Copy link
Contributor Author

mboersma commented Jun 16, 2022

It appears we have a non-deterministic failure in a unit test:

% go test -timeout 30s -tags e2e -run '^TestAzureMachinePool_Validate$' sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1 -count=1
ok  	sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1	0.341s
% go test -timeout 30s -tags e2e -run '^TestAzureMachinePool_Validate$' sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1 -count=1
ok  	sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1	0.508s
% go test -timeout 30s -tags e2e -run '^TestAzureMachinePool_Validate$' sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1 -count=1
ok  	sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1	0.372s
% go test -timeout 30s -tags e2e -run '^TestAzureMachinePool_Validate$' sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1 -count=1
ok  	sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1	0.348s
% go test -timeout 30s -tags e2e -run '^TestAzureMachinePool_Validate$' sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1 -count=1
--- FAIL: TestAzureMachinePool_Validate (0.00s)
    --- FAIL: TestAzureMachinePool_Validate/HasInvalidMaximumTerminateNotificationTimeout (0.00s)
        azuremachinepool_test.go:113: 
            Expected
                <string>: spec: Forbidden: can be set only if the MachinePool feature flag is enabled
            to contain substring
                <string>: maximum timeout 15 is allowed for TerminateNotificationTimeout
    --- FAIL: TestAzureMachinePool_Validate/HasValidImage (0.00s)
        azuremachinepool_test.go:66: 
            Unexpected error:
                <*field.Error | 0x14000594040>: {
                    Type: "FieldValueForbidden",
                    Field: "spec",
                    BadValue: <string>"",
                    Detail: "can be set only if the MachinePool feature flag is enabled",
                }
                spec: Forbidden: can be set only if the MachinePool feature flag is enabled
            occurred
FAIL
FAIL	sigs.k8s.io/cluster-api-provider-azure/exp/api/v1beta1	0.374s
FAIL

Edit: see #2394

@Jont828
Copy link
Contributor

Jont828 commented Jun 16, 2022

@mboersma Overall the changes look good to me. If I'm understanding this correctly that test failure isn't related to this PR and you can rebase off of #2394 to make it work.

@mboersma mboersma force-pushed the latest-k8s-in-e2e branch from 95ad291 to a6dede5 Compare June 17, 2022 20:47
@mboersma
Copy link
Contributor Author

/retest

@mboersma mboersma force-pushed the latest-k8s-in-e2e branch from a6dede5 to 651b241 Compare June 21, 2022 18:49
@mboersma mboersma force-pushed the latest-k8s-in-e2e branch from 651b241 to 949f0d7 Compare June 21, 2022 18:53
@@ -89,18 +89,18 @@ func (s *Service) GetDefaultWindowsImage(ctx context.Context, location, k8sVersi
osAndVersion = azure.DefaultWindowsOsAndVersion
}

// Starting with 1.22 we default to containerd for Windows unless the runtime flag is set.
if v.GE(v122) && runtime != "dockershim" && !strings.HasSuffix(osAndVersion, "-containerd") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix needs to be backported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's backport the whole PR, I think we also want to backport the test fixes

@mboersma
Copy link
Contributor Author

/retest

Failure in labelNodesWithMachinePoolName doesn't appear to be related to this PR.

@CecileRobertMichon
Copy link
Contributor

The last run used v1.22.11 which was just published so looks like this is working as expected 🎉

/lgtm
/assign @jsturtevant

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2022
@CecileRobertMichon
Copy link
Contributor

/retest

@jsturtevant
Copy link
Contributor

The last run used v1.22.11 which was just published so looks like this is working as expected 🎉

Where do you see this in the logs? I've been poking around having trouble finding the version selected. If found the sku selected in the capz manager logs. If its not logged should it be so we have a way to verify the image selected? I probably just looking in wrong place :-)

@CecileRobertMichon
Copy link
Contributor

Where do you see this in the logs?

INFO: Creating the workload cluster with name "capz-e2e-t2ccsz-ipv6" using the "ipv6" template (Kubernetes v1.22.11, 3 control-plane machines, 1 worker machines)

+1 to logging the selection version though to make it more obvious

@jsturtevant
Copy link
Contributor

thanks! Now that the version isn't in the sku, I think it would be useful to print the version along with

custom-vnet-control-plane-zdqrq" "namespace"="capz-e2e-ovf7lb" "reconciler group"="infrastructure.cluster.x-k8s.io" 
"reconciler kind"="AzureMachine" "x-ms-correlation-request-id"="d60c2e26-ad92-4ce4-aa88-3d245bea9aec" 
"location"="canadacentral" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1"

@mboersma
Copy link
Contributor Author

mboersma commented Jun 22, 2022

I think it would be useful to print the version

If you're referring to the "VM image cache [hit|miss]" logs, that's at a level above version, so it really can't be logged there. But we could add logging to getDefaultImageSKUIDAndVersion which would help.

Edit: see #2412

@mboersma mboersma mentioned this pull request Jun 22, 2022
3 tasks
@CecileRobertMichon
Copy link
Contributor

@jsturtevant any objection to merging this? Logging is being improved in a separate PR

@jsturtevant
Copy link
Contributor

sounds good
/lgtm

@CecileRobertMichon
Copy link
Contributor

/approve
/cherry-pick release-1.3

@k8s-infra-cherrypick-robot

@CecileRobertMichon: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

In response to this:

/approve
/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 23, 2022
@k8s-ci-robot k8s-ci-robot merged commit ee51392 into kubernetes-sigs:main Jun 23, 2022
@k8s-infra-cherrypick-robot

@CecileRobertMichon: new pull request created: #2416

In response to this:

/approve
/cherry-pick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mboersma mboersma deleted the latest-k8s-in-e2e branch June 23, 2022 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants