Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance and automate acceptance tests to accelerate releases #564

Open
mumoshu opened this issue May 22, 2021 · 7 comments
Open

Enhance and automate acceptance tests to accelerate releases #564

mumoshu opened this issue May 22, 2021 · 7 comments
Labels
enhancement New feature or request

Comments

@mumoshu
Copy link
Collaborator

mumoshu commented May 22, 2021

I usually run the acceptance test suite documented in https://github.com/actions-runner-controller/actions-runner-controller#contributing plus random manual test cases I have come up with at the time of cutting the new release.

The process is time-consuming and error-prone.

And that makes people frustrated and asking for a new release(like #562), further delaying the next release by consuming my time for responding. Even after finally cutting the new release, there are still unrecognized problems like #427, #467, and #468 which confused users.

Can we do anything better?

My own idea is not that good- Just somehow enhance and automate acceptance tests. Ideally, it should run on every commit and every pull request. It should also cover more cases, like #560.

Automation isn't that easy.

First, which test environment should we use? Should we use kind for the acceptance test environment? Or anyone could contribute some infrastructure for running a real k8s cluster for acceptance testing?

Second, how can we enhance the test suite in a sustainable manner? What we have today is just a set of shell scripts and template files https://github.com/actions-runner-controller/actions-runner-controller/tree/master/acceptance. Does it scale when we add more test cases and data?

Third, how can we run github actions workflows in tests? We have some workflow definitions for testing. But our acceptance test doesn't run it. We need to set up some github repository and organization for testing, deploy runners associated to the repo and the org, push a git commit to a repo to trigger a workflow run for testing. How can we reliably automate that process?

There might be more. If you're willing to help on any of the problems listed in this issue, or have comments, please reach us 😄

@mumoshu mumoshu added the enhancement New feature or request label May 22, 2021
@mumoshu
Copy link
Collaborator Author

mumoshu commented May 22, 2021

Regarding the automation, I'm wondering if something like the below works:

  • Have a GitHub organization and a small pool of repositories for testing
  • Have a "registry" of the organization and the pool of repositories in a form of e.g. a YAML file
    • Optionally each repository should have a GitHub webhook configured for testing webhook-based autoscaling.
  • Whenever an acceptance test run is triggered, it firstly takes an exclusive lock on a repository
  • Push an initial commit containing a workflow definition and the "lease" file. The lease file contains the owner and the expiration date of the lease. Another attempt to take an already locked repository/lease should fail by reading the lease file.
    • The workflow definition's triggers should exclude "pushes to the default branch" to avoid unnecessarily triggering the workflow run on push to the lease file and the workflow definition itself.
  • Deploy actions-runner-controller onto a dedicated K8s cluster (needs to be dedicated as CRDs can't be namespaced)
    • Can be a managed K8s cluster or kind. Beware that it can be difficult to test webhook-based autoscaling with kind, as there isn't a straightforward way to expose a kind cluster to the internet.
  • Trigger a workflow run by e.g. pushing a test commit to a test branch in the target repository.
  • At the end of the workflow run, a job would somehow send the test result to the acceptance test run. For instance, the job could write to a K8s configmap whose sole key is "result" and its value is "ok", so that the acceptance test run could poll the configmap until it seems the result: ok in the said configmap.
  • Clean up the repository by removing the lease and the workflow definition files and deleting the test branch.

Now it's time to build a PoC for this complex beast. Anyone could help me build it?

@mumoshu
Copy link
Collaborator Author

mumoshu commented May 22, 2021

Can be a managed K8s cluster or kind. Beware that it can be difficult to test webhook-based autoscaling with kind, as there isn't a straightforward way to expose a kind cluster to the internet.

I'm trying to figure out if it's possible to run a kind cluster within a docker container or a K8s pod.

References:

That might be cheaper in terms of infrastructure cost than having a pool of e.g. EKS clusters for testing. But my time isn't that cheap so I might give up depending on how hard it turns out to be 😃

@mumoshu
Copy link
Collaborator Author

mumoshu commented May 22, 2021

Shameless plug but I've added another GitHub sponsor tier for compensating test infrastructure https://github.com/sponsors/mumoshu. I hope any company could contribute to it.

I'm going to use EKS if I successfully got sponsored and there's no specific request to use a specific IaaS/platform. I might use two or more if I had multiple sponsors for this.

@mumoshu
Copy link
Collaborator Author

mumoshu commented May 22, 2021

If we were to run kind in a container or a pod, we might need to use something like Cloudflare Argo Tunnel to expose the webhook server for testing webhook-based autoscaling.

From my experience running it on my Ubuntu dev machine, I can see this container image is likely to work as the basis for your solution https://github.com/msnelling/docker-cloudflared

And kind and cloudflared must very likely reside within the same container as the github actions runner, so that kind have access to the dockerd and cloudflared has access to the container port corresponds to the node port of the K8s service of the webhook endpoint.

@igorbrigadir
Copy link

igorbrigadir commented May 23, 2021

I'm currently using microk8s to run a cluster and actions-runner-controller works very well there - maybe it can be as simple as something like this https://github.com/balchua/microk8s-actions ? I haven't tried running the actual tests on microk8s yet, but the actual deployment works just fine.

@mumoshu
Copy link
Collaborator Author

mumoshu commented May 23, 2021

@igorbrigadir Hey! Thanks. Yeah, you might have a few options for running K8s to run the controller, kind or microk8s.

However, you still need to somehow expose the webhook-autoscaler's webhook endpoint to the Internet for access from GitHub. and you also need to have a pool of GitHub repositories for testing, so that you can actually trigger a workflow run as a part of the test.

@toast-gear
Copy link
Collaborator

https://github.com/helm/kind-action Helm maintained action for spinning up a kind cluster, going to have a play

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants