-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Libvirt e2e CI for pull requests #1318
Libvirt e2e CI for pull requests #1318
Conversation
Notice that I ran the e2e libvirt workflow by tagging this PR with The Regardless, I would like to receive a feedback on this version first. :) |
eb9ad29
to
6b39ad3
Compare
6b39ad3
to
a9ea697
Compare
Updated with a fix to |
a9ea697
to
7c6fcef
Compare
Updated the |
7c6fcef
to
e4e7df9
Compare
Update the PR with:
|
e4e7df9
to
1771246
Compare
1771246
to
83bcda7
Compare
83bcda7
to
95abd88
Compare
Blocked by #1332 |
5d4d489
to
b6945f3
Compare
Fix #1332 is merged, rebased and giving it a try again. |
FYI the CAA image build failed for this: https://github.com/confidential-containers/cloud-api-adaptor/actions/runs/5939073597/job/16104756661 so it probably won't work yet. Re-queued that to see if that helps |
4d70490
to
4de5345
Compare
4de5345
to
56dce17
Compare
@bpradipt @stevenhorsman I finally made this work, however on code path was not tested: the job that builds the CAA image (although I tested it on a fork) because it should run on pull_request_target (meaning the workflow executed should be in the main branch). I had to disable two tests that are failing only on CI, but I will be investigating the problem once this is merged. I appreciate if you can make a "loosely" review as I would like to have it merged asap in order to 1) check whether it really works then send fixes; 2) start testing the PRs on this project against the libvirt provider |
Do we have any issues like the Azure tests where we need to have an approval before the libvirt tests run to avoid exposing secrets/Fabiano using the nodes for Crypto mining? |
Good question! The workflow needs a privileged token to publish the caa images on the ghcr.io. As the workflow run on pull_request_target it is not possible for a pull request to run a malicious workflow, that's the first fence of protection. Second being the label, someone needs to label the PR as 'test_e2e_libvirt' otherwise it won't run. Only committers should be allowed to put labels on PRs, so this is basically the approval mechanism implemented. Am I wrong and anyone can label a PR? Unlike some workflows on kata, here it is not explicitly instantiating any resources on Azure so we don't even need the secrets configured on the project. The runners are auto-magically deployed by garm and handed over as self-hosted runners. This is the third security fence, a.k.a, anti-Fabiano's crypto scam ;) |
tree install | ||
git diff | ||
export TEST_PROVISION="yes" | ||
export TEST_TEARDOWN="no" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might come to it in a later commit, but is Teardown no so we can get debug info and the test runner node should be cleaned up anyway at the end of the workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly Steve!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given it a non-extensive review and it looks good. I wonder if we can start to re-use pieces of this in the other (e.g. Azure, but maybe internal IBM) tests too, but that's something for when the code is more tested. Thanks for the great work Wainer
Thanks for the review sir! My aim was to create a reusable workflow indeed, for example, the podvm (that should be improved) and CAA build jobs are meant generic. But I didn't have time to look in deep how the CI for the other providers are implemented today. Definitely I want to have more shareable code among the providers CI. |
.github/workflows/e2e_on_pull.yaml
Outdated
runs-on: ubuntu-latest | ||
if: ${{ contains(github.event.pull_request.labels.*.name, 'test_e2e_libvirt') }} | ||
steps: | ||
- run: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a dummy, but it should be - run: "true"
otherwise it's interpreted as boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @mkulke, first of all, thanks for the review. For this comment in particular, yeah, I hadn't tested that code path...good catch. Fixed.
|
||
- name: Update kustomization configuration | ||
run: | | ||
providers=(${{ env.PROVIDERS }}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could providers be a matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they could but I thought it would be overkill. My idea is to have a single install
directory to be shared by all downstream jobs (i.e. provider specific jobs). If I convert to gha matrix it will create N jobs (N VMs instantiated) to produce the same artifact, it seems just resources waste.
Created a callable workflow from the image.yaml to allow us to reuse the code on other workflows but tweaking parameters. For example, on an e2e testing workflow on pull request we don't want to overwrite the `latest` on quay.io, so we may push to a temporary image to ghcr.io. A calller workflow will be able to overwrite: * the registry. Defaults to 'quay.io/confidential-containers' * the dev build arches. Defaults to 'linux/amd64' * the release build arches. Defaults to 'linux/amd64,linux/s390x,linux/ppc64le' Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Currently the hack/build.sh builds the cloud-api-adaptor image in dev and release modes. The image is tagged as `dev-$sha1` and `latest` when in dev mode, and just `$sha1` for releases. However we may want to tag the image differently, for example, on releases context the release image should have the git tag (e.g. `v0.7.0`). This changed the script to optionally read from the `DEV_TAGS` and `RELEASE_TAGS` environment variables the tags that should be used. For instance, `make image RELEASE_BUILD=true RELEASE_TAGS=v0.7.0` will push to quay.io/confidential-containers/cloud-api-adaptor:v0.7.0 a release image. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Added just a skeleton for the execution of the libvirt e2e tests on pull request, it will do nothing at this point. The workflow runs when the PR is labeled 'test_e2e_libvirt'. Fixes confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Added a job to build the podvm for various provider/OS/ARCH which actually currently just download the qcow2 files and put into an archive so that downstream jobs can simply fetch the files to their runners. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Changed the caa_build_and_push.yaml workflow to receive new inputs: * dev_tags: Comma-separated list of tags for the dev built image (e.g. latest,ci-dev). By default uses the values from hack/build.sh * release_tags: Likewise but for the release built image This will allow us to publish cloud-api-adaptor images with tags differently from the defaults. Examples: a) for the sake of CI we don't need to publish the release builds with `latest` tag, we may also record the pull request number on the tag b) a release workflow may want to tag the image with a git tag. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The added job builds cloud-api-adaptor and push to ghcr.io registry. The kustomization files are modified to reference the new images then the install directory is archived so that downstream jobs can get to use those files. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Fully implemented the libvirt e2e workflow to run the tests with the Ubuntu podvm. Fixes confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Run the libvirt e2e tests for CentOS podvm too. Fixed confidential-containers#599 Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
There was added a self-hosted runner on Azure via Garm (**) which is labeled 'az-ubuntu-2204'. Run the libvirt e2e workflow on that VM which support nested virt. (**) https://github.com/confidential-containers/infra/tree/main/github/azure-self-hosted-runners Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The libvirt/kcli_version.sh will create a k8s cluster version 1.26.7 unless the CLUSTER_VERSION variable is exported. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
On provisioner's Deploy() and Delete() it should print the output of `kubectl apply -k` before the function return in case of failure. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
Print debug information at the end of the Libvirt e2e workflow if the run tests step failed. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The TestLibvirtCreatePodWithConfigMap and TestLibvirtCreatePeerPodContainerWithExternalIPAccess tests are failing on the VM runner on CI. Let's disable them temporarialy to keep the CI running while we figure out the problem. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
The libvirt e2e CI should run on pull_request_target so the jobs get access to secrets. For example, the image job will publish on ghcr.io with admin privileges. It is added the 'authorize' job that is the first triggered when the PR is labeled, the following up jobs depending on it. Signed-off-by: Wainer dos Santos Moschetta <[email protected]>
56dce17
to
98918bd
Compare
Updated to address the comment of @mkulke regaring |
# This is the key used by cloud-api-adaptor to connect to libvirt | ||
- name: Generate the SSH key | ||
run: | | ||
ssh-keygen -f ./id_rsa -N "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wainersm Shouldn't this be a required step for Install kcli
rather than handling ssh-keygen there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @ldoktor , I'm glad to see you here :)
This key is used by cloud-api-adaptor to connect to the Libvirt on host. It could be the same key as used by the kcli to install the cluster, but I prefer to keep them two separated keys.
@@ -14,6 +15,7 @@ func TestLibvirtCreateSimplePod(t *testing.T) { | |||
} | |||
|
|||
func TestLibvirtCreatePodWithConfigMap(t *testing.T) { | |||
t.Skip("Failing on CI") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using an ENV value to skip it only when running in CI while allowing it otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea @ldoktor . Mind if I send in another PR? I'd like to get this merged asap to start testing the project's code as well as workflow's :)
This is my second tentative to have the libvirt e2e tests running for pull requests.
From the user point of view, the workflow is executed by adding the
test_e2e_libvirt
label. In theory if we add support for other providers, it would be a matter of adding a new label. However it may not work as the common jobs (see below) will be triggered every time atest_e2e_*
label is added to the PR. I prefer to defer that discussion for when (and if) we add other e2e jobs.Still regarding how it is triggered, an alternative would be to use comments on PR (e.g.
/test_e2e_libvirt
) but then we hit a bunch of limitations on Github actions design. To mention the most basic: on commenting the workflow runs on context of themain
branch meaning that it doesn't give the reference to the pull request number (or other information like the source tree/branch). That's why I decided to try with the label approach.Like in the previous tentative the workflow is split two set of jobs: a) common/shareable among cloud e2e, b) cloud specific e2e
The common jobs:
The job1 is actually not building podvm as this is a time/resources consuming task. Instead, it is using the
./podvm/hack/download-image.sh
to extract the qcow2 file from already builtquay.io/confidential-containers/podvm-*
images. The qcow2 files are attached to the workflow so that downstream jobs (i.e. the jobs that effectively run e2e tests) can simply download-and-use the disk. Note: I am not sure about this approach of archiving the disk file as they are huge...we may hit storage issues.The job2 is going to build and push the cloud-api-adaptor to
ghcr.io/confidential-containers/cloud-api-adaptor
. The kustomization files (install
dir) are changed to use that new image on installation. So just like in job1 the entireinstall
directory is attached to the workflow so that downstream jobs can simply use it (here the files are a couple of kb so we won't have storage problem).On more comment on job2: as it creates temporary images to
ghcr.io
I implement a script that will clean up unused images. I will watch how this PR evolves to publish its code.Fixes #599