Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Use Kubernetes GC to clean kubevirt VMs (packet-* jobs) #11530

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Sep 13, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:
We regularly have CI flakes where the job failed to delete k8s namespace in the CI cluster.
It's not much, but it's a little hiccup in the PR process which I'd like to eliminate.

I'm not sure what the exact reason is, probably some race between the jobs and the time between fetching the list of namespace and the deletion.
Regardless, a simpler way to delete the VMs is to let them be dependants (in the kubernetes sense) of the job pod. This way, once the job pod is deleted, kubernetes garbage collection in the CI cluster will take care of removing the associated VMs

Special notes for your reviewer:
PR on the ci infra kubespray/kspray-infra#1 (private repo, maintainers have access)

Does this PR introduce a user-facing change?:

NONE

/label tide/merge-method-merge

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 13, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 13, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Sep 13, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Sep 13, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Sep 13, 2024

/cc @ant31

@VannTen
Copy link
Contributor Author

VannTen commented Sep 13, 2024

/retest
(now that I fixed the gitlab-runner config)

@VannTen
Copy link
Contributor Author

VannTen commented Sep 13, 2024 via email

@VannTen VannTen force-pushed the ci/cleanup_with_k8s_gc branch 2 times, most recently from fd43216 to d70f8e2 Compare September 20, 2024 13:45
@VannTen
Copy link
Contributor Author

VannTen commented Sep 20, 2024

/label ci-full

(To test it works correctly for everything)

@k8s-ci-robot
Copy link
Contributor

@VannTen: The label(s) /label ci-full cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label ci-full

(To test it works correctly for everything)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@VannTen
Copy link
Contributor Author

VannTen commented Sep 20, 2024 via email

@ant31
Copy link
Contributor

ant31 commented Sep 23, 2024

I think the PR to add them via /label is still in review

For now you can add them manually
image

@VannTen
Copy link
Contributor Author

VannTen commented Sep 23, 2024

I'll do that once the initial set of tests pass then 👍

This leverage the Kubernetes GC to delete kubevirt VMs, by using
ownerReferences, with the CI pod running the playbook as the owner.
This concretely means that the control plane in our CI cluster will
delete the kubevirt VMs associated with a particular ci job as soon as
that pod job is deleted, which usually happens when the job terminates,
(barring errors, which will be addressed in the cluster directly)

Upgrade to kubevirt.io/v1 for the VirtualMachine manifests, since the
alpha version is deprecated.
Kubevirt VMs deletion will be handled by the Kubernetes GC (see previous
commit), remove all the codes handling that.
Also, use the Ready condition of VirtualMachine instead of
custom checks
@VannTen VannTen force-pushed the ci/cleanup_with_k8s_gc branch 2 times, most recently from d1ca52f to 1b5fa4b Compare October 4, 2024 09:00
@VannTen VannTen force-pushed the ci/cleanup_with_k8s_gc branch 2 times, most recently from b1bdc8f to 077dcad Compare October 7, 2024 09:48
Not constraining the inventory to .ini allows us to use dynamic
inventory, which is needed for simplifying kubevirt jobs inventory.

Also reduces the scope of the ANSIBLE_INVENTORY variable.
This allows a single source of truth for the virtual machines in a
kubevirt ci-run.

`etcd_member_name` should be correctly handled in kubespray-defaults for
testing the recover cases.
VMI in Kubevirt are the abstraction below VirtualMachine.

- We don't really need the extra abstraction of VirtualMachine objects
- Fix the provisioning playbook not waiting correctly on the VMs until
  they have an IP address to use for the dynamic inventory
@VannTen
Copy link
Contributor Author

VannTen commented Oct 7, 2024

I still need to figure out how to avoid breaking the upgrade testing (since the inventory file is not there) but otherwise this should be ready soonish :)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 8, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants