Autoscaling recommendations with GitLab #439

zifeo · 2024-11-20T21:12:05Z

Following the advices from the readme, a newly node can experience some delay before images can be used from the proxy. While this may work with classical deployment with retries, it seems to cause issue when GitLab CI is managing the job. Is there any other recommendation for such setup?

WARNING: Event retrieved from the cluster: 0/7 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 1 node(s) had untolerated taint {nvidia.com/gpu: present}, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/7 nodes are available: 1 No preemption victims found for incoming pod, 6 Preemption is not helpful for scheduling.
WARNING: Event retrieved from the cluster: Failed to pull image "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1": failed to pull and unpack image "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1": failed to resolve reference "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1": failed to do request: Head "http://localhost:7439/v2/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper/manifests/x86_64-v17.3.1": dial tcp [::1]:7439: connect: connection refused
WARNING: Event retrieved from the cluster: Error: ErrImagePull
WARNING: Event retrieved from the cluster: Error: ImagePullBackOff
WARNING: Failed to pull image "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1" with policy "": image pull failed: Back-off pulling image "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1"
ERROR: Job failed: prepare environment: waiting for pod running: pulling image "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1": image pull failed: Back-off pulling image "localhost:7439/registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.3.1". Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

The text was updated successfully, but these errors were encountered:

paullaffitte · 2024-11-22T15:37:22Z

I'm not sure to understand, what GitLab CI is doing here?

zifeo · 2024-11-22T16:08:54Z

@paullaffitte GitLab runner is launching CI jobs on demand on Kubernetes. When there are too many jobs and Kubernetes decide to scale up the node count, there is a race conditions between the new job and the proxy being available. This usually works with classical workloads because of the automatic retry, however in case of a job managed by the GitLab runner the failure is not retried on the init container. I am looking to see you face similar situation and what else can be tried?

paullaffitte · 2024-11-25T10:15:29Z

Did you try to set a pull policy : https://docs.gitlab.com/runner/executors/kubernetes/#set-a-pull-policy

zifeo changed the title ~~Autoscaling recommendation with GitLab~~ Autoscaling recommendations with GitLab Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling recommendations with GitLab #439

Autoscaling recommendations with GitLab #439

zifeo commented Nov 20, 2024

paullaffitte commented Nov 22, 2024

zifeo commented Nov 22, 2024

paullaffitte commented Nov 25, 2024

Autoscaling recommendations with GitLab #439

Autoscaling recommendations with GitLab #439

Comments

zifeo commented Nov 20, 2024

paullaffitte commented Nov 22, 2024

zifeo commented Nov 22, 2024

paullaffitte commented Nov 25, 2024