Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix tests in sig-node related to CRIv1 update #28071

Merged

Conversation

akhilerm
Copy link
Member

Fixes tests under

  • sig-nod-containerd
  • sig-node-release-blocking
  • sig-node-presubmits

The following tests are fixed:

  • ci-kubernetes-node-kubelet-containerd-eviction
  • ci-kubernetes-node-kubelet-containerd-resource-managers
  • ci-kubernetes-node-kubelet-serial-containerd
  • pull-kubernetes-node-kubelet-serial-containerd
  • pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • pull-kubernetes-node-kubelet-serial-pod-disruption-conditions

Updated cos images to cos-stable
Updated ubuntu to ubuntu-2204-lts image_family

This update will fix the following jobs:
- ci-kubernetes-node-kubelet-containerd-eviction
- ci-kubernetes-node-kubelet-serial-containerd
- pull-kubernetes-node-kubelet-serial-containerd
- pull-kubernetes-node-kubelet-serial-containerd-kubetest2
- pull-kubernetes-node-kubelet-serial-pod-disruption-conditions

Signed-off-by: Akhil Mohan <[email protected]>
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 21, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @akhilerm. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/config Issues or PRs related to code in /config area/jobs sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Nov 21, 2022
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 21, 2022
machine: n1-standard-2 # These tests need a lot of memory
metadata: "user-data</workspace/test-infra/jobs/e2e_node/containerd/init.yaml,cni-template</workspace/test-infra/jobs/e2e_node/containerd/cni.template,containerd-config</workspace/test-infra/jobs/e2e_node/containerd/config.toml"
cos-stable2:
image_family: cos-93-lts # deprecated after October 2023 (https://cloud.google.com/container-optimized-os/docs/release-notes)
image_family: cos-stable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev Why were we maintaining 2 separate cos versions for the tests in image-config-serial ?

@akhilerm
Copy link
Member Author

/cc @SergeyKanzhelev @bobbypage @adisky

Created this as a separate PR from #27943 so as not to make the review process of that more complex.

@adisky
Copy link
Contributor

adisky commented Nov 21, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 21, 2022
image_family: pipeline-1-24
project: ubuntu-os-gke-cloud
image_family: ubuntu-2204-lts
project: ubuntu-os-cloud
machine: n1-standard-2 # These tests need a lot of memory
metadata: "user-data</workspace/test-infra/jobs/e2e_node/containerd/init.yaml,cni-template</workspace/test-infra/jobs/e2e_node/containerd/cni.template,containerd-config</workspace/test-infra/jobs/e2e_node/containerd/config.toml"
Copy link
Member

@bobbypage bobbypage Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to point to containerd config file that will set CONTAINERD_SYSTEMD_CGROUP so that systemd cgroup driver will be configured on containerd side as well. I don't see that being the case currently?

Copy link
Member

@bobbypage bobbypage Nov 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these jobs just use this helper script (https://github.com/kubernetes/test-infra/blob/master/jobs/e2e_node/containerd/init.yaml) which installs this containerd config file - https://github.com/kubernetes/test-infra/blob/master/jobs/e2e_node/containerd/config.toml

Do we need to maybe make a copy of that config.toml for systemd cgroup driver and point the test to use that config file?

The other option is to configure this test to also use containerd from main like the some of the other jobs, so we can use CONTAINERD_SYSTEMD_CGROUP setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to configure this test to also use containerd from main like the some of the other jobs, so we can use CONTAINERD_SYSTEMD_CGROUP setting.

What was the reason for these tests using separate config.toml file? If we change to how its done in other tests, like you suggested, is there something else that need to be taken care of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a new config-systemd.toml that will be used instead of config.toml to use systemd as cgroup driver.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 29, 2022
@akhilerm akhilerm force-pushed the fix-sig-node-containerd-tests branch from 540ed38 to 7ae16b9 Compare November 30, 2022 12:27
use the config file with systemd enabled in tests which use cgroupv2

Signed-off-by: Akhil Mohan <[email protected]>
@akhilerm akhilerm force-pushed the fix-sig-node-containerd-tests branch from 7ae16b9 to ff69de5 Compare November 30, 2022 12:32
@akhilerm
Copy link
Member Author

@bobbypage I have made the changes. Can you take a look at it.

Comment on lines +35 to +39
# Enable registry.k8s.io as the primary mirror for k8s.gcr.io
# See: https://github.com/kubernetes/k8s.io/issues/3411
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
endpoint = ["https://registry.k8s.io", "https://k8s.gcr.io",]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now registry.k8s.io is GA (and introduced in the master branch of k/k) there is no specific value to do this.
we should remove this as a follow-up.

Copy link
Member Author

@akhilerm akhilerm Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will create a separate cleanup PR for that, so that the original config.toml can also be fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ameukam Found this comment in the referring issue for the migration, So should we wait for some more time before switching?

@@ -3,14 +3,14 @@
# `gcloud compute --project <to-project> images create <image-name> --source-disk=<image-name>`
images:
ubuntu:
image_family: pipeline-1-24
project: ubuntu-os-gke-cloud
image_family: ubuntu-2204-lts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why stop use the GKE images ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ameukam So that it will pick up the latest public ubuntu 22.04 image which is cgroupv2 enabled. Ref

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I had a wrong assumption of pipeline-1-24. I thought since this family is ubuntu-based:

$gcloud compute images describe-from-family pipeline-1-24 --project ubuntu-os-gke-cloud --format='value(name,selfLink)'  
ubuntu-gke-2204-1-24-v20221128	https://www.googleapis.com/compute/v1/projects/ubuntu-os-gke-cloud/global/images/ubuntu-gke-2204-1-24-v20221128

the node(s) will have cgroupsv2 enabled.

@akhilerm
Copy link
Member Author

akhilerm commented Dec 1, 2022

/cc @bobbypage

@bobbypage
Copy link
Member

bobbypage commented Dec 6, 2022

One small change needed, other than that LGTM, thank you for updating!

Signed-off-by: Akhil Mohan <[email protected]>
@bobbypage
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2022
@bobbypage
Copy link
Member

/assign @dims

@dims
Copy link
Member

dims commented Dec 6, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akhilerm, dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 6, 2022
@k8s-ci-robot k8s-ci-robot merged commit 4c443dd into kubernetes:master Dec 6, 2022
@k8s-ci-robot
Copy link
Contributor

@akhilerm: Updated the job-config configmap in namespace default at cluster test-infra-trusted using the following files:

  • key containerd.yaml using file config/jobs/kubernetes/sig-node/containerd.yaml
  • key node-kubelet.yaml using file config/jobs/kubernetes/sig-node/node-kubelet.yaml
  • key sig-node-presubmit.yaml using file config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml

In response to this:

Fixes tests under

  • sig-nod-containerd
  • sig-node-release-blocking
  • sig-node-presubmits

The following tests are fixed:

  • ci-kubernetes-node-kubelet-containerd-eviction
  • ci-kubernetes-node-kubelet-containerd-resource-managers
  • ci-kubernetes-node-kubelet-serial-containerd
  • pull-kubernetes-node-kubelet-serial-containerd
  • pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • pull-kubernetes-node-kubelet-serial-pod-disruption-conditions

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akhilerm akhilerm deleted the fix-sig-node-containerd-tests branch December 6, 2022 03:26
@liggitt
Copy link
Member

liggitt commented Dec 6, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/config Issues or PRs related to code in /config area/jobs cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants