Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should build the CNM for both linux and Windows if image is not present #3270

Closed
jsturtevant opened this issue Mar 10, 2023 · 13 comments · Fixed by #3284
Closed

Should build the CNM for both linux and Windows if image is not present #3270

jsturtevant opened this issue Mar 10, 2023 · 13 comments · Fixed by #3284
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test.

Comments

@jsturtevant
Copy link
Contributor

Which jobs are flaky:
We found that our Windows jobs weren't building the windows image for CCM: kubernetes/kubernetes#116474

Which tests are flaky:

Testgrid link:

Reason for failure (if possible):
We were missing the ENV variable and have added this to fix the issue temporarily but we should be building the image for all architectures if missing.

If we don't then we might create a tag for linux only and then later publich a multi-arch image under the same tag.

Remove this check:

https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/0f497f833a710ad2ad720124f2dd3003da9326e8/scripts/ci-build-azure-ccm.sh#L64C3-L68

Anything else we need to know:

  • links to go.k8s.io/triage appreciated
  • links to specific failures in spyglass appreciated

/kind flake

[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added the kind/flake Categorizes issue or PR as related to a flaky test. label Mar 10, 2023
@CecileRobertMichon
Copy link
Contributor

/assign @jackfrancis

@CecileRobertMichon
Copy link
Contributor

@lzhecheng FYI this would essentially revert #2171

Since the jobs reuse images built and pushed by other jobs, we are concerned about potential race conditions when one job builds Linux only and a future job then retags the image to be multi-arch

@CecileRobertMichon
Copy link
Contributor

Once we do this we can also revert kubernetes/test-infra#28991

@lzhecheng
Copy link
Contributor

@CecileRobertMichon hello, first, I think it is CNM (cloud-node-manager) right? CCM is on control plane node and they should be linux.
For cloud-provider-azure, all presubmit jobs are linux only so building windows take longer time and unnecessary resources.
What about different tags for linux only job and linux+windows job?

@jsturtevant
Copy link
Contributor Author

yes, CNM.

Do you use the script in this repo for the pre-submits on cloud-provider-azure?

@jsturtevant jsturtevant changed the title Should build the CMM for both linux and Windows if image is not present Should build the CNM for both linux and Windows if image is not present Mar 13, 2023
@jackfrancis
Copy link
Contributor

all presubmit jobs are linux only

@lzhecheng that surprises me, how do we validate functional changes against Windows scenarios?

@lzhecheng
Copy link
Contributor

yes, CNM.

Do you use the script in this repo for the pre-submits on cloud-provider-azure?

Yes, all cloud-provider-azure jobs are using ci-entrypoint.sh

@lzhecheng
Copy link
Contributor

all presubmit jobs are linux only

@lzhecheng that surprises me, how do we validate functional changes against Windows scenarios?

We use postsubmit, daily jobs to check Windows, like this one:
https://testgrid.k8s.io/provider-azure-cloud-provider-azure#cloud-provider-azure-ccm-windows-capz

@jackfrancis
Copy link
Contributor

@lzhecheng thanks for all that information

Is this an example of a presubmit job?

The above successful job run took 2 hours 47 mins.

Here's a sample postsubmit job w/ Windows:

That job took 2 hours 46 mins.

Is it possible that adding Windows builds will not actually add a noticable time duration cost to the average test run?

@lzhecheng
Copy link
Contributor

@lzhecheng thanks for all that information

Is this an example of a presubmit job?

The above successful job run took 2 hours 47 mins.

Here's a sample postsubmit job w/ Windows:

That job took 2 hours 46 mins.

Is it possible that adding Windows builds will not actually add a noticable time duration cost to the average test run?

Oh, I didn't expect that. It took quite some time building windows image locally before.
Thank you! I agree linux and windows images can be built together.

@jackfrancis
Copy link
Contributor

@lzhecheng Thanks for confirming, will do!

@jsturtevant
Copy link
Contributor Author

@lzhecheng out of curiosity, why are the windows jobs not included in pre-submit? Is build times? flakiness? something else?

@lzhecheng
Copy link
Contributor

@lzhecheng out of curiosity, why are the windows jobs not included in pre-submit? Is build times? flakiness? something else?

When these windows jobs were deployed, they were indeed flaky. In addition, I think for most cases, a PR won't just break windows scenario, it will also be detected in a linux job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants