-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading from v0.17.0 canary to v0.18 + helm chart scales to infinity and beyond #427
Comments
DIFF @@ -117,7 +137,7 @@ resource "helm_release" "actions" {
name = local.actions.name
repository = "https://summerwind.github.io/actions-runner-controller"
chart = "actions-runner-controller"
- version = "0.7.0"
+ version = "0.10.4"
namespace = kubernetes_namespace.ci.metadata[0].name
lint = false
reset_values = true
@@ -138,7 +158,7 @@ resource "helm_release" "actions" {
image = {
repository = "harbor.infra.foo.com/cache/summerwind/actions-runner-controller"
- tag = "canary"
+ tag = "v0.18.0"
dindSidecarRepositoryAndTag = "harbor.infra.foo.com/cache/library/docker:dind"
pullPolicy = "Always"
}
@@ -205,7 +225,7 @@ spec:
spec:
nodeSelector:
${var.eks_node_labels.spot.key}: ${var.eks_node_labels.spot.value}
- image: harbor.infra.foo.com/cache/foo/actions-runner:v2.276.1
+ image: harbor.infra.foo.com/cache/foo/actions-runner:v2.277.1
imagePullPolicy: Always
repository: ${local.actions.git_repository}
serviceAccountName: ${local.actions.service_account_name}
@@ -228,7 +248,7 @@ spec:
scaleTargetRef:
name: foo-github-actions-deployment
minReplicas: 4
- maxReplicas: 100
+ maxReplicas: 500
metrics:
- type: TotalNumberOfQueuedAndInProgressWorkflowRuns
repositoryNames:
@@ -239,7 +259,7 @@ spec:
types: ["created"]
status: "queued"
amount: 1
- duration: "2m"
+ duration: "1m"
MANIFEST
EOF
} |
Haven't got a solution but I'm running 0.10.4 and 0.18.0 together fine, they are considered compatible but there is probably a bug in the app code |
The old canary I had deployed was 0.17 with github webhook server and health check route. is it possible for me to somehow get that docker image? now my CI is kinda broken |
I don't think I quite get what you mean? The containers get published to Docker Hub so you should be able to just reference the old tag and it gets pulled down? https://hub.docker.com/r/summerwind/actions-runner-controller/tags That or pull a copy locally and push it up to your private registry if that is the source of your controllers? |
Deleted the runners CRD finally with the below command
|
@callum-tait-pbx so I deleted everything. The helm chart and the CRDs. When I install the helm chart again, and run |
@Puneeth-n Probably HRA status and other custom resources like RunnerDeployment and RunnerReplicaSet? Could you share us dump of all those custom resources except Runners? I think the important point here is that you deleting CRDs doesn't automatically delete CRs (especially when you forced the deletion of it by manually removing the CRD finalized as I remember). // FWIW, the cause of infinite scale does seem due to mismatch between CRD definition and the controller code. I'm afraid there's any ultimate way to prevent that. |
@mumoshu I deleted the helm chart and the corresponding runner autoscaler and runner deployment. I just have the CRDs now. Is there a way to reset and delete everything? PFB some remaining info ➜ infra git:(feature/upgrade-harbor-gha) ✗ k get crds
NAME CREATED AT
alertmanagers.monitoring.coreos.com 2020-10-25T00:44:17Z
certificates.cert-manager.io 2021-01-26T10:16:32Z
challenges.acme.cert-manager.io 2021-01-26T10:16:35Z
clusterissuers.cert-manager.io 2021-01-26T10:16:38Z
eniconfigs.crd.k8s.amazonaws.com 2020-10-22T09:50:24Z
horizontalrunnerautoscalers.actions.summerwind.dev 2021-03-31T19:54:20Z
issuers.cert-manager.io 2021-01-26T10:16:43Z
orders.acme.cert-manager.io 2021-01-26T10:16:45Z
podmonitors.monitoring.coreos.com 2020-10-25T00:44:17Z
probes.monitoring.coreos.com 2020-10-25T00:44:18Z
prometheuses.monitoring.coreos.com 2020-10-25T00:44:18Z
prometheusrules.monitoring.coreos.com 2020-10-25T00:44:19Z
runnerdeployments.actions.summerwind.dev 2021-03-31T19:54:20Z
runnerreplicasets.actions.summerwind.dev 2021-03-31T19:54:21Z
runners.actions.summerwind.dev 2021-03-31T19:54:21Z
securitygrouppolicies.vpcresources.k8s.aws 2020-10-22T09:50:27Z
servicemonitors.monitoring.coreos.com 2020-10-25T00:44:19Z
targetgroupbindings.elbv2.k8s.aws 2020-10-27T11:49:53Z
thanosrulers.monitoring.coreos.com 2020-10-25T00:44:20Z
|
@mumoshu |
@Puneeth-n Hey! It seems like you already "isolated" runners so there's no way actions-runner-controller could take back the control of them- You'll see Probably the only way forward would be to remove runners. |
@Puneeth-n Which number? |
@mumoshu after running |
@Puneeth-n That seems due to actions-runner-controller isn't deployed. Runners have the finalizer added by actions-runner-controller so to normally delete runners you need to run actions-runner-conntroller. I think you could try If you used the former method, also note that you perhaps need to remove any remaining runners on GitHub-side with GitHub API or Web UI. |
Deleting runners can get a bit weird, you'll probably need to do this:
Roughly this:
"finalizers": [
],
|
So I uninstalled the helm chart, deleted the CRDs and did a fresh install of ONLY the chart 0.10.4 and tag v0.18.1. I did not apply any deployments or horizontal runner autoscaler. Now I am deleting all the runners and it seems to work. The count of runners is decreasing. I ran a script in loop to fetch all runners from GitHub and delete them. so I cleaned up on the GitHub side. I had like some 200 offline runners |
Excuse my lack of deep understanding of kubernetes :) |
Wanted to freeze the controller version as I had it pointing to canary #367. Ended up with this mess :/ |
In general don't run the canary image unless you know why you are running it (and you'll know why if so). It's an especially unstable development image as any push to master that isn't for the runners will trigger a build and a publish to Dockerhub. Version pinning is your friend my friend! :D |
@callum-tait-pbx that was my intent behind this. #427 (comment) But ended up with this mess. |
@mumoshu @callum-tait-pbx so now, once all the runners are deleted, when I deploy |
Looks like it based on the output I can see in the issue and your comments. Your logs for your controller should look happy at this point if everything is working as expected, it's worth taking a peak at them before deploying your runner setup to confirm. |
@callum-tait-pbx yep. controller looks good, webhook server looks good, runners came up and registered in GitHub. Awesome! :) Thanks guys! |
@Puneeth-n could you close the issue if it's all working please 🙏 , ty |
Configuration
The text was updated successfully, but these errors were encountered: