Libvirt e2e CI for pull requests #1318

wainersm · 2023-08-09T14:55:19Z

This is my second tentative to have the libvirt e2e tests running for pull requests.

From the user point of view, the workflow is executed by adding the test_e2e_libvirt label. In theory if we add support for other providers, it would be a matter of adding a new label. However it may not work as the common jobs (see below) will be triggered every time a test_e2e_* label is added to the PR. I prefer to defer that discussion for when (and if) we add other e2e jobs.

Still regarding how it is triggered, an alternative would be to use comments on PR (e.g. /test_e2e_libvirt) but then we hit a bunch of limitations on Github actions design. To mention the most basic: on commenting the workflow runs on context of the main branch meaning that it doesn't give the reference to the pull request number (or other information like the source tree/branch). That's why I decided to try with the label approach.

Like in the previous tentative the workflow is split two set of jobs: a) common/shareable among cloud e2e, b) cloud specific e2e

The common jobs:

job1: podvm builder job
job2: image builder job

The job1 is actually not building podvm as this is a time/resources consuming task. Instead, it is using the ./podvm/hack/download-image.sh to extract the qcow2 file from already built quay.io/confidential-containers/podvm-* images. The qcow2 files are attached to the workflow so that downstream jobs (i.e. the jobs that effectively run e2e tests) can simply download-and-use the disk. Note: I am not sure about this approach of archiving the disk file as they are huge...we may hit storage issues.

The job2 is going to build and push the cloud-api-adaptor to ghcr.io/confidential-containers/cloud-api-adaptor. The kustomization files (install dir) are changed to use that new image on installation. So just like in job1 the entire install directory is attached to the workflow so that downstream jobs can simply use it (here the files are a couple of kb so we won't have storage problem).

On more comment on job2: as it creates temporary images to ghcr.io I implement a script that will clean up unused images. I will watch how this PR evolves to publish its code.

Fixes #599

wainersm · 2023-08-09T23:13:45Z

Notice that I ran the e2e libvirt workflow by tagging this PR with test_e2e_libvirt.

The e2e tests / image / build and push (dev, linux/amd64) and e2e tests / image / build and push (release, linux/amd64,linux/s390x,linux/ppc64le) jobs failed to push the built CAA image, the reason being that the pull_request event doesn't give a GITHUB_TOKEN with admin privileges to the workflow (for security reasons). In order to make it work, we will need to run the entire workflow as pull_request_target or the publishing job as workflow_run (because those two events will have granted a read/write token for ghcr.io)

Regardless, I would like to receive a feedback on this version first. :)

wainersm · 2023-08-11T22:18:07Z

Updated with a fix to Install tests dependencies step on the libvirt job. I had added a commit to disable the image build for the sake of testing the libvirt job, before the merge I will remove that commit.

wainersm · 2023-08-14T12:34:17Z

Updated the libvirt job to install a given Go version and the build-essential package.

wainersm · 2023-08-14T23:02:57Z

Update the PR with:

now pinning the version of k8s installed (v1.26.7)
downloading the correct version of Ubuntu (20.04 instead of 22.04)

wainersm · 2023-08-17T18:01:20Z

Blocked by #1332

.github/workflows/e2e_libvirt.yaml

wainersm · 2023-08-22T13:17:19Z

Fix #1332 is merged, rebased and giving it a try again.

stevenhorsman · 2023-08-22T13:22:02Z

Fix #1332 is merged, rebased and giving it a try again.

FYI the CAA image build failed for this: https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/5939073597/job/16104756661 so it probably won't work yet. Re-queued that to see if that helps

wainersm · 2023-08-24T17:08:56Z

@bpradipt @stevenhorsman I finally made this work, however on code path was not tested: the job that builds the CAA image (although I tested it on a fork) because it should run on pull_request_target (meaning the workflow executed should be in the main branch). I had to disable two tests that are failing only on CI, but I will be investigating the problem once this is merged.

I appreciate if you can make a "loosely" review as I would like to have it merged asap in order to 1) check whether it really works then send fixes; 2) start testing the PRs on this project against the libvirt provider

stevenhorsman · 2023-08-24T19:03:28Z

Do we have any issues like the Azure tests where we need to have an approval before the libvirt tests run to avoid exposing secrets/Fabiano using the nodes for Crypto mining?

wainersm · 2023-08-24T21:08:46Z

Do we have any issues like the Azure tests where we need to have an approval before the libvirt tests run to avoid exposing secrets/Fabiano using the nodes for Crypto mining?

Good question!

The workflow needs a privileged token to publish the caa images on the ghcr.io. As the workflow run on pull_request_target it is not possible for a pull request to run a malicious workflow, that's the first fence of protection. Second being the label, someone needs to label the PR as 'test_e2e_libvirt' otherwise it won't run. Only committers should be allowed to put labels on PRs, so this is basically the approval mechanism implemented. Am I wrong and anyone can label a PR?

Unlike some workflows on kata, here it is not explicitly instantiating any resources on Azure so we don't even need the secrets configured on the project. The runners are auto-magically deployed by garm and handed over as self-hosted runners. This is the third security fence, a.k.a, anti-Fabiano's crypto scam ;)

stevenhorsman · 2023-08-25T07:17:15Z

.github/workflows/e2e_libvirt.yaml

-          tree install
-          git diff
+          export TEST_PROVISION="yes"
+          export TEST_TEARDOWN="no"


I might come to it in a later commit, but is Teardown no so we can get debug info and the test runner node should be cleaned up anyway at the end of the workflow?

Exactly Steve!

stevenhorsman

I've given it a non-extensive review and it looks good. I wonder if we can start to re-use pieces of this in the other (e.g. Azure, but maybe internal IBM) tests too, but that's something for when the code is more tested. Thanks for the great work Wainer

wainersm · 2023-08-25T12:25:11Z

I've given it a non-extensive review and it looks good. I wonder if we can start to re-use pieces of this in the other (e.g. Azure, but maybe internal IBM) tests too, but that's something for when the code is more tested. Thanks for the great work Wainer

Thanks for the review sir!

My aim was to create a reusable workflow indeed, for example, the podvm (that should be improved) and CAA build jobs are meant generic. But I didn't have time to look in deep how the CI for the other providers are implemented today. Definitely I want to have more shareable code among the providers CI.

mkulke · 2023-08-28T13:24:28Z

.github/workflows/e2e_on_pull.yaml

+    runs-on: ubuntu-latest
+    if: ${{ contains(github.event.pull_request.labels.*.name, 'test_e2e_libvirt') }}
+    steps:
+      - run: true


it's a dummy, but it should be - run: "true" otherwise it's interpreted as boolean

hey @mkulke, first of all, thanks for the review. For this comment in particular, yeah, I hadn't tested that code path...good catch. Fixed.

mkulke · 2023-08-28T13:25:19Z

.github/workflows/e2e_on_pull.yaml

+
+      - name: Update kustomization configuration
+        run: |
+          providers=(${{ env.PROVIDERS }})


Could providers be a matrix?

Yes, they could but I thought it would be overkill. My idea is to have a single install directory to be shared by all downstream jobs (i.e. provider specific jobs). If I convert to gha matrix it will create N jobs (N VMs instantiated) to produce the same artifact, it seems just resources waste.

Created a callable workflow from the image.yaml to allow us to reuse the code on other workflows but tweaking parameters. For example, on an e2e testing workflow on pull request we don't want to overwrite the `latest` on quay.io, so we may push to a temporary image to ghcr.io. A calller workflow will be able to overwrite: * the registry. Defaults to 'quay.io/confidential-containers' * the dev build arches. Defaults to 'linux/amd64' * the release build arches. Defaults to 'linux/amd64,linux/s390x,linux/ppc64le' Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Currently the hack/build.sh builds the cloud-api-adaptor image in dev and release modes. The image is tagged as `dev-$sha1` and `latest` when in dev mode, and just `$sha1` for releases. However we may want to tag the image differently, for example, on releases context the release image should have the git tag (e.g. `v0.7.0`). This changed the script to optionally read from the `DEV_TAGS` and `RELEASE_TAGS` environment variables the tags that should be used. For instance, `make image RELEASE_BUILD=true RELEASE_TAGS=v0.7.0` will push to quay.io/confidential-containers/cloud-api-adaptor:v0.7.0 a release image. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Added just a skeleton for the execution of the libvirt e2e tests on pull request, it will do nothing at this point. The workflow runs when the PR is labeled 'test_e2e_libvirt'. Fixes confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Added a job to build the podvm for various provider/OS/ARCH which actually currently just download the qcow2 files and put into an archive so that downstream jobs can simply fetch the files to their runners. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Changed the caa_build_and_push.yaml workflow to receive new inputs: * dev_tags: Comma-separated list of tags for the dev built image (e.g. latest,ci-dev). By default uses the values from hack/build.sh * release_tags: Likewise but for the release built image This will allow us to publish cloud-api-adaptor images with tags differently from the defaults. Examples: a) for the sake of CI we don't need to publish the release builds with `latest` tag, we may also record the pull request number on the tag b) a release workflow may want to tag the image with a git tag. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The added job builds cloud-api-adaptor and push to ghcr.io registry. The kustomization files are modified to reference the new images then the install directory is archived so that downstream jobs can get to use those files. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Fully implemented the libvirt e2e workflow to run the tests with the Ubuntu podvm. Fixes confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Run the libvirt e2e tests for CentOS podvm too. Fixed confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

There was added a self-hosted runner on Azure via Garm (**) which is labeled 'az-ubuntu-2204'. Run the libvirt e2e workflow on that VM which support nested virt. (**) https://github.com/confidential-containers/infra/tree/main/github/azure-self-hosted-runners Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The libvirt/kcli_version.sh will create a k8s cluster version 1.26.7 unless the CLUSTER_VERSION variable is exported. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

On provisioner's Deploy() and Delete() it should print the output of `kubectl apply -k` before the function return in case of failure. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

Print debug information at the end of the Libvirt e2e workflow if the run tests step failed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The TestLibvirtCreatePodWithConfigMap and TestLibvirtCreatePeerPodContainerWithExternalIPAccess tests are failing on the VM runner on CI. Let's disable them temporarialy to keep the CI running while we figure out the problem. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

The libvirt e2e CI should run on pull_request_target so the jobs get access to secrets. For example, the image job will publish on ghcr.io with admin privileges. It is added the 'authorize' job that is the first triggered when the PR is labeled, the following up jobs depending on it. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

wainersm · 2023-08-28T19:33:52Z

Updated to address the comment of @mkulke regaring run: true. Also rebased to main.

ldoktor · 2023-08-29T06:53:10Z

.github/workflows/e2e_libvirt.yaml

+        # This is the key used by cloud-api-adaptor to connect to libvirt
+      - name: Generate the SSH key
+        run: |
+          ssh-keygen -f ./id_rsa -N ""


@wainersm Shouldn't this be a required step for Install kcli rather than handling ssh-keygen there as well?

hi @ldoktor , I'm glad to see you here :)

This key is used by cloud-api-adaptor to connect to the Libvirt on host. It could be the same key as used by the kcli to install the cluster, but I prefer to keep them two separated keys.

ldoktor · 2023-08-29T07:04:23Z

test/e2e/libvirt_test.go

@@ -14,6 +15,7 @@ func TestLibvirtCreateSimplePod(t *testing.T) {
 }

 func TestLibvirtCreatePodWithConfigMap(t *testing.T) {
+	t.Skip("Failing on CI")


How about using an ENV value to skip it only when running in CI while allowing it otherwise?

That's a good idea @ldoktor . Mind if I send in another PR? I'd like to get this merged asap to start testing the project's code as well as workflow's :)

wainersm added the test_e2e_libvirt Run Libvirt e2e tests label Aug 9, 2023

wainersm force-pushed the workflows_for_libvirt_ci branch from eb9ad29 to 6b39ad3 Compare August 11, 2023 21:35

wainersm had a problem deploying to external August 11, 2023 21:35 — with GitHub Actions Failure

wainersm force-pushed the workflows_for_libvirt_ci branch from 6b39ad3 to a9ea697 Compare August 11, 2023 22:16

wainersm had a problem deploying to external August 11, 2023 22:16 — with GitHub Actions Failure

wainersm force-pushed the workflows_for_libvirt_ci branch from a9ea697 to 7c6fcef Compare August 14, 2023 12:32

wainersm had a problem deploying to external August 14, 2023 12:32 — with GitHub Actions Failure

wainersm force-pushed the workflows_for_libvirt_ci branch from 7c6fcef to e4e7df9 Compare August 14, 2023 23:01

wainersm had a problem deploying to external August 14, 2023 23:01 — with GitHub Actions Failure

wainersm force-pushed the workflows_for_libvirt_ci branch from e4e7df9 to 1771246 Compare August 15, 2023 01:05

wainersm had a problem deploying to external August 15, 2023 01:05 — with GitHub Actions Failure

wainersm force-pushed the workflows_for_libvirt_ci branch from 1771246 to 83bcda7 Compare August 15, 2023 16:40

wainersm had a problem deploying to external August 15, 2023 16:40 — with GitHub Actions Failure

wainersm force-pushed the workflows_for_libvirt_ci branch from 83bcda7 to 95abd88 Compare August 15, 2023 19:16

wainersm had a problem deploying to external August 15, 2023 19:16 — with GitHub Actions Failure

wainersm had a problem deploying to external August 15, 2023 22:41 — with GitHub Actions Failure

wainersm mentioned this pull request Aug 17, 2023

libvirt: improve the get domain IPs mechanism #1332

Merged

wainersm added the hold label Aug 17, 2023

stevenhorsman reviewed Aug 21, 2023

View reviewed changes

.github/workflows/e2e_libvirt.yaml Show resolved Hide resolved

wainersm force-pushed the workflows_for_libvirt_ci branch from 5d4d489 to b6945f3 Compare August 22, 2023 13:15

wainersm had a problem deploying to external August 22, 2023 13:15 — with GitHub Actions Failure

wainersm removed hold test_e2e_libvirt Run Libvirt e2e tests labels Aug 22, 2023

wainersm force-pushed the workflows_for_libvirt_ci branch from 4d70490 to 4de5345 Compare August 24, 2023 14:09

wainersm removed the test_e2e_libvirt Run Libvirt e2e tests label Aug 24, 2023

wainersm force-pushed the workflows_for_libvirt_ci branch from 4de5345 to 56dce17 Compare August 24, 2023 17:02

stevenhorsman reviewed Aug 25, 2023

View reviewed changes

stevenhorsman approved these changes Aug 25, 2023

View reviewed changes

mkulke approved these changes Aug 28, 2023

View reviewed changes

wainersm added 14 commits August 28, 2023 16:25

workflow/e2e: implement the libvirt e2e job

5349bfa

Fully implemented the libvirt e2e workflow to run the tests with the Ubuntu podvm. Fixes confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

workflow: run libvirt e2e with centos podvm

0d1da7f

Run the libvirt e2e tests for CentOS podvm too. Fixed confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

libvirt: pin k8s version in kcli_version.sh

63b14f9

The libvirt/kcli_version.sh will create a k8s cluster version 1.26.7 unless the CLUSTER_VERSION variable is exported. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

test/provisioner: print traces before return

cb2ca18

On provisioner's Deploy() and Delete() it should print the output of `kubectl apply -k` before the function return in case of failure. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

workflow: add debug step on libvirt e2e

055311d

Print debug information at the end of the Libvirt e2e workflow if the run tests step failed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>

wainersm force-pushed the workflows_for_libvirt_ci branch from 56dce17 to 98918bd Compare August 28, 2023 19:25

ldoktor reviewed Aug 29, 2023

View reviewed changes

wainersm merged commit 739dcab into confidential-containers:main Aug 31, 2023

wainersm mentioned this pull request Sep 13, 2023

Refactor podvm workflows and enable on libvirt e2e #1436

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Libvirt e2e CI for pull requests #1318

Libvirt e2e CI for pull requests #1318

wainersm commented Aug 9, 2023

wainersm commented Aug 9, 2023

wainersm commented Aug 11, 2023

wainersm commented Aug 14, 2023

wainersm commented Aug 14, 2023

wainersm commented Aug 17, 2023

wainersm commented Aug 22, 2023

stevenhorsman commented Aug 22, 2023 •

edited

Loading

wainersm commented Aug 24, 2023

stevenhorsman commented Aug 24, 2023

wainersm commented Aug 24, 2023

stevenhorsman Aug 25, 2023

wainersm Aug 25, 2023

stevenhorsman left a comment

wainersm commented Aug 25, 2023

mkulke Aug 28, 2023

wainersm Aug 28, 2023

mkulke Aug 28, 2023

wainersm Aug 28, 2023

wainersm commented Aug 28, 2023

ldoktor Aug 29, 2023

wainersm Aug 30, 2023

ldoktor Aug 29, 2023

wainersm Aug 30, 2023

Libvirt e2e CI for pull requests #1318

Libvirt e2e CI for pull requests #1318

Conversation

wainersm commented Aug 9, 2023

wainersm commented Aug 9, 2023

wainersm commented Aug 11, 2023

wainersm commented Aug 14, 2023

wainersm commented Aug 14, 2023

wainersm commented Aug 17, 2023

wainersm commented Aug 22, 2023

stevenhorsman commented Aug 22, 2023 • edited Loading

wainersm commented Aug 24, 2023

stevenhorsman commented Aug 24, 2023

wainersm commented Aug 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenhorsman left a comment

Choose a reason for hiding this comment

wainersm commented Aug 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wainersm commented Aug 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevenhorsman commented Aug 22, 2023 •

edited

Loading