Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

capz-periodic-e2e-full-main CI job failing #2696

Closed
mboersma opened this issue Oct 3, 2022 · 16 comments · Fixed by #2698
Closed

capz-periodic-e2e-full-main CI job failing #2696

mboersma opened this issue Oct 3, 2022 · 16 comments · Fixed by #2698
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@mboersma
Copy link
Contributor

mboersma commented Oct 3, 2022

/kind bug

What steps did you take and what happened:

In testgrid, the capz-periodic-e2e-full-main job began failing after the custom VM extensions PR got merged. Here is the log that seems to implicate new custom VM code somehow:

capz-e2e: Workload cluster creation Creating clusters using clusterclass [OPTIONAL] with a single control plane node, one linux worker node, and one windows worker node expand_less | 21m9s
-- | --
{ Failure /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_test.go:670 Unexpected error:     <autorest.DetailedError>: {         Original: <*azure.RequestError \| 0xc000f6a6c0>{             DetailedError: {                 Original: nil,                 PackageType: "",                 Method: "",                 StatusCode: <int>404,                 Message: "",                 ServiceError: nil,                 Response: {                     Status: "404 Not Found",                     StatusCode: 404,                     Proto: "HTTP/2.0",                     ProtoMajor: 2,                     ProtoMinor: 0,                     Header: {                         "X-Ms-Failure-Cause": ["gateway"],                         "X-Ms-Correlation-Request-Id": [                             "fa92e7e4-30c2-4a30-9670-2aacda76b2e8",                         ],                         "Date": [                             "Mon, 03 Oct 2022 17:27:42 GMT",                         ],                         "Content-Length": ["110"],                         "Content-Type": [                             "application/json; charset=utf-8",                         ],                         "Expires": ["-1"],                         "X-Ms-Request-Id": [                             "fa92e7e4-30c2-4a30-9670-2aacda76b2e8",                         ],                         "X-Ms-Routing-Request-Id": [                             "CENTRALUS:20221003T172742Z:fa92e7e4-30c2-4a30-9670-2aacda76b2e8",                         ],                         "Strict-Transport-Security": [                             "max-age=31536000; includeSubDomains",                         ],                         "X-Content-Type-Options": ["nosniff"],                         "Cache-Control": ["no-cache"],                         "Pragma": ["no-cache"],                     },                     Body: <io.nopCloser>{                         Reader: <*bytes.Buffer \| 0xc001d539e0>{                             buf: "{\"error\":{\"code\":\"ResourceGroupNotFound\",\"message\":\"Resource group 'capz-e2e-2tk088-cc' could not be found.\"}}",                             off: 0,                             lastRead: 0,                         },                     },                     ContentLength: 110,                     TransferEncoding: nil,                     Close: false,                     Uncompressed: false,                     Trailer: nil,                     Request: {                         Method: "GET",                         URL: {                             Scheme: "https",                             Opaque: "",                             User: nil,                             Host: "management.azure.com",                             Path: "/subscriptions/===REDACTED===/resourceGroups/capz-e2e-2tk088-cc/providers/Microsoft.Compute/virtualMachines",                             RawPath: "",                             ForceQuery: false,                             RawQuery: "api-version=2021-11-01",                             Fragment: "",                             RawFragment: "",                         },                         Proto: "",                         ProtoMajor: 0,                         ProtoMinor: 0,                         Header: {                             "User-Agent": [                                 "Go/go1.18.6 (amd64-linux) go-autorest/v14.2.1 Azure-SDK-For-Go/v63.4.0 compute/2021-11-01",                             ],                             "Authorization": [                                 "Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IjJaUXBKM1VwYmpBWVhZR2FYRUpsOGxWMFRPSSIsImtpZCI6IjJaUXBKM1VwYmpBWVhZR2FYRUpsOGxWMFRPSSJ9.eyJhdWQiOiJodHRwczovL21hbmFnZW1lbnQuYXp1cmUuY29tLyIsImlzcyI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0LzA5N2Y4OWEwLTkyODYtNDNkMi05YTFhLTA4ZjFkNDliMWFmOC8iLCJpYXQiOjE2NjQ4MTc3NjIsIm5iZiI6MTY2NDgxNzc2MiwiZXhwIjoxNjY0ODIxNjYyLCJhaW8iOiJFMlpnWVBnWFdQZzQ1RTFRc1lTL2gzNmR3TU1JQUE9PSIsImFwcGlkIjoi===REDACTED===IiwiYXBwaWRhY3IiOiIxIiwiaWRwIjoiaHR0cHM6Ly9zdHMud2luZG93cy5uZXQvM...  Gomega truncated this representation as it exceeds 'format.MaxLength'. Consider having the object provide a custom 'GomegaStringer' representation or adjust the parameters in Gomega's 'format' package.  Learn more here: https://onsi.github.io/gomega/#adjusting-output      compute.VirtualMachinesClient#List: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceGroupNotFound" Message="Resource group 'capz-e2e-2tk088-cc' could not be found." occurred /home/prow/go/src/sigs.k8s.io/cluster-api-provider-azure/test/e2e/azure_vmextensions.go:97}

What did you expect to happen:

Anything else you would like to add:

Environment:

  • cluster-api-provider-azure version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 3, 2022
@mboersma
Copy link
Contributor Author

mboersma commented Oct 3, 2022

/assign @willie-yao

@k8s-ci-robot
Copy link
Contributor

@mboersma: GitHub didn't allow me to assign the following users: willie-yao.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @willie-yao

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@willie-yao
Copy link
Contributor

/assign

@willie-yao
Copy link
Contributor

I'm not exactly familiar with clusterclass, but I saw that there were no AzureMachines or AzureMachinePools in the template spec. If that's the case, I don't think I should be adding the custom vm extension test to the clusterclass test. This is most likely the source of the issue. Pinging @CecileRobertMichon @jackfrancis to make sure this is the case.

@jackfrancis
Copy link
Contributor

@willie-yao I would try to reproduce the test locally with and without the additional VM extensions foo.

AZURE_LOCATION=eastus GINKGO_FOCUS="Creating clusters using clusterclass" SKIP_CLEANUP=true ./scripts/ci-e2e.sh

Are we really using a "naked" Cluster (no actual machines) to test ClusterClass?

@willie-yao
Copy link
Contributor

I'm trying the tests now. I don't see any Machines or AzureMachine being defined in the topology spec which the clusterclass test uses.

@willie-yao
Copy link
Contributor

@jackfrancis It passes when the test is removed, and fails when it isn't. I think we should remove it for now since I don't think the test actually tests anything for the clusterclass.

@CecileRobertMichon
Copy link
Contributor

Are we really using a "naked" Cluster (no actual machines) to test ClusterClass?

No, worker machines are defined here: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/templates/test/ci/cluster-template-prow-topology.yaml#L44

@jackfrancis
Copy link
Contributor

sgtm

@jackfrancis
Copy link
Contributor

@willie-yao let's remove it for now and file an issue that we need to backfill test coverage of vm extensions for clusterclass E2E scenarios

@CecileRobertMichon
Copy link
Contributor

The error message implies we're not even finding the resource group:

Status=404 Code="ResourceGroupNotFound" Message="Resource group 'capz-e2e-2tk088-cc' could not be found

Is it possible clusterclass resource group name is different from other tests and that's why we can't list machines?

@willie-yao
Copy link
Contributor

Hmm that's interesting, I actually missed that part of the error message and I'll look into it further.

@willie-yao
Copy link
Contributor

@CecileRobertMichon Looks like there is another set of generated characters following the 'capz-e2e-2tk088-cc', which is different from the other tests. Other tests' resource group name is the same as the cluster name, but it looks like it's different for clusterclass.

@willie-yao
Copy link
Contributor

Is there a reason why the cluster name is different from the resource group name for the clusterclass? I can't find a way to get the resource group name from the E2E test files, so maybe we should be keeping the naming consistent.

@willie-yao
Copy link
Contributor

I updated #2698 to get the resource group name correctly, instead of using the clusterName as the resource group name. This should solve the problem.

@mboersma
Copy link
Contributor Author

mboersma commented Oct 5, 2022

The clusterclass test spec in this job passed (unrelated tests failed unfortunately) Nice work @willie-yao!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants