Enhance and automate acceptance tests to accelerate releases #564

mumoshu · 2021-05-22T03:05:40Z

I usually run the acceptance test suite documented in https://github.com/actions-runner-controller/actions-runner-controller#contributing plus random manual test cases I have come up with at the time of cutting the new release.

The process is time-consuming and error-prone.

And that makes people frustrated and asking for a new release(like #562), further delaying the next release by consuming my time for responding. Even after finally cutting the new release, there are still unrecognized problems like #427, #467, and #468 which confused users.

Can we do anything better?

My own idea is not that good- Just somehow enhance and automate acceptance tests. Ideally, it should run on every commit and every pull request. It should also cover more cases, like #560.

Automation isn't that easy.

First, which test environment should we use? Should we use kind for the acceptance test environment? Or anyone could contribute some infrastructure for running a real k8s cluster for acceptance testing?

Second, how can we enhance the test suite in a sustainable manner? What we have today is just a set of shell scripts and template files https://github.com/actions-runner-controller/actions-runner-controller/tree/master/acceptance. Does it scale when we add more test cases and data?

Third, how can we run github actions workflows in tests? We have some workflow definitions for testing. But our acceptance test doesn't run it. We need to set up some github repository and organization for testing, deploy runners associated to the repo and the org, push a git commit to a repo to trigger a workflow run for testing. How can we reliably automate that process?

There might be more. If you're willing to help on any of the problems listed in this issue, or have comments, please reach us 😄

The text was updated successfully, but these errors were encountered:

mumoshu · 2021-05-22T11:16:06Z

Regarding the automation, I'm wondering if something like the below works:

Have a GitHub organization and a small pool of repositories for testing
Have a "registry" of the organization and the pool of repositories in a form of e.g. a YAML file
- Optionally each repository should have a GitHub webhook configured for testing webhook-based autoscaling.
Whenever an acceptance test run is triggered, it firstly takes an exclusive lock on a repository
Push an initial commit containing a workflow definition and the "lease" file. The lease file contains the owner and the expiration date of the lease. Another attempt to take an already locked repository/lease should fail by reading the lease file.
- The workflow definition's triggers should exclude "pushes to the default branch" to avoid unnecessarily triggering the workflow run on push to the lease file and the workflow definition itself.
Deploy actions-runner-controller onto a dedicated K8s cluster (needs to be dedicated as CRDs can't be namespaced)
- Can be a managed K8s cluster or kind. Beware that it can be difficult to test webhook-based autoscaling with kind, as there isn't a straightforward way to expose a kind cluster to the internet.
Trigger a workflow run by e.g. pushing a test commit to a test branch in the target repository.
At the end of the workflow run, a job would somehow send the test result to the acceptance test run. For instance, the job could write to a K8s configmap whose sole key is "result" and its value is "ok", so that the acceptance test run could poll the configmap until it seems the result: ok in the said configmap.
Clean up the repository by removing the lease and the workflow definition files and deleting the test branch.

Now it's time to build a PoC for this complex beast. Anyone could help me build it?

mumoshu · 2021-05-22T12:43:29Z

Can be a managed K8s cluster or kind. Beware that it can be difficult to test webhook-based autoscaling with kind, as there isn't a straightforward way to expose a kind cluster to the internet.

I'm trying to figure out if it's possible to run a kind cluster within a docker container or a K8s pod.

References:

That might be cheaper in terms of infrastructure cost than having a pool of e.g. EKS clusters for testing. But my time isn't that cheap so I might give up depending on how hard it turns out to be 😃

mumoshu · 2021-05-22T12:46:15Z

Shameless plug but I've added another GitHub sponsor tier for compensating test infrastructure https://github.com/sponsors/mumoshu. I hope any company could contribute to it.

I'm going to use EKS if I successfully got sponsored and there's no specific request to use a specific IaaS/platform. I might use two or more if I had multiple sponsors for this.

mumoshu · 2021-05-22T13:03:50Z

If we were to run kind in a container or a pod, we might need to use something like Cloudflare Argo Tunnel to expose the webhook server for testing webhook-based autoscaling.

From my experience running it on my Ubuntu dev machine, I can see this container image is likely to work as the basis for your solution https://github.com/msnelling/docker-cloudflared

And kind and cloudflared must very likely reside within the same container as the github actions runner, so that kind have access to the dockerd and cloudflared has access to the container port corresponds to the node port of the K8s service of the webhook endpoint.

igorbrigadir · 2021-05-23T12:58:48Z

I'm currently using microk8s to run a cluster and actions-runner-controller works very well there - maybe it can be as simple as something like this https://github.com/balchua/microk8s-actions ? I haven't tried running the actual tests on microk8s yet, but the actual deployment works just fine.

mumoshu · 2021-05-23T23:29:15Z

@igorbrigadir Hey! Thanks. Yeah, you might have a few options for running K8s to run the controller, kind or microk8s.

However, you still need to somehow expose the webhook-autoscaler's webhook endpoint to the Internet for access from GitHub. and you also need to have a pool of GitHub repositories for testing, so that you can actually trigger a workflow run as a part of the test.

toast-gear · 2021-06-20T20:00:21Z

https://github.com/helm/kind-action Helm maintained action for spinning up a kind cluster, going to have a play

mumoshu added the enhancement New feature or request label May 22, 2021

mumoshu mentioned this issue May 22, 2021

New release 0.18.3? #562

Closed

mumoshu mentioned this issue May 30, 2021

Feat/harden actions runner controller #441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance and automate acceptance tests to accelerate releases #564

Enhance and automate acceptance tests to accelerate releases #564

mumoshu commented May 22, 2021

mumoshu commented May 22, 2021 •

edited

Loading

mumoshu commented May 22, 2021

mumoshu commented May 22, 2021 •

edited

Loading

mumoshu commented May 22, 2021 •

edited

Loading

igorbrigadir commented May 23, 2021 •

edited

Loading

mumoshu commented May 23, 2021

toast-gear commented Jun 20, 2021

Enhance and automate acceptance tests to accelerate releases #564

Enhance and automate acceptance tests to accelerate releases #564

Comments

mumoshu commented May 22, 2021

mumoshu commented May 22, 2021 • edited Loading

mumoshu commented May 22, 2021

mumoshu commented May 22, 2021 • edited Loading

mumoshu commented May 22, 2021 • edited Loading

igorbrigadir commented May 23, 2021 • edited Loading

mumoshu commented May 23, 2021

toast-gear commented Jun 20, 2021

mumoshu commented May 22, 2021 •

edited

Loading

mumoshu commented May 22, 2021 •

edited

Loading

mumoshu commented May 22, 2021 •

edited

Loading

igorbrigadir commented May 23, 2021 •

edited

Loading