Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to connect ARC runner with SLES 15 image #3181

Closed
4 tasks done
greg-teradata opened this issue Feb 29, 2024 · 2 comments · Fixed by #3182
Closed
4 tasks done

Unable to connect ARC runner with SLES 15 image #3181

greg-teradata opened this issue Feb 29, 2024 · 2 comments · Fixed by #3182
Assignees
Labels
bug Something isn't working docker Pull requests that update Docker code Runner Bug Bug fix scope to the runner

Comments

@greg-teradata
Copy link

Checks

Controller Version

0.7.0

Deployment Method

Helm

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

We're using a custom SLES15 SP4 image as an ARC runner. We've pulled the default base image from SUSE, and installed some tools that we need. Here's a snippet of our Dockerfile which shows how the runner is being installed on the image:


RUN useradd -d /home/runner --uid 1001 runner \
    && groupadd docker --gid 123 \
    && usermod -aG docker runner \
    && echo "runner    ALL=NOPASSWD: ALL" > /etc/sudoers 

ARG RUNNER_CONTAINER_HOOKS_VERSION="0.5.0"
ENV DEBIAN_FRONTEND=noninteractive
ENV RUNNER_MANUALLY_TRAP_SIG=1
ENV ACTIONS_RUNNER_PRINT_LOG_TO_STDOUT=1
ARG RUNNER_VERSION="2.312.0"
ARG RUNNER_ARCH="x64"

RUN curl -f -L -o runner.tar.gz https://github.com/actions/runner/releases/download/v${RUNNER_VERSION}/actions-runner-linux-${RUNNER_ARCH}-${RUNNER_VERSION}.tar.gz \
    && tar xzf ./runner.tar.gz \
    && rm runner.tar.gz
RUN curl -f -L -o runner-container-hooks.zip https://github.com/actions/runner-container-hooks/releases/download/v${RUNNER_CONTAINER_HOOKS_VERSION}/actions-runner-hooks-k8s-${RUNNER_CONTAINER_HOOKS_VERSION}.zip \
    && unzip ./runner-container-hooks.zip -d ./k8s \
    && rm runner-container-hooks.zip 

Here's our helm overrides file:

## template is the PodSpec for each runner Pod
## For reference: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodSpec

maxRunners: 8
minRunners: 1

template:
   spec:
     initContainers:
     - name: init-dind-externals
       image: ghcr.io/actions/actions-runner:latest
       command: ["cp", "-r", "-v", "/home/runner/externals/.", "/home/runner/tmpDir/"]
       volumeMounts:
         - name: dind-externals
           mountPath: /home/runner/tmpDir
     containers:
     - name: runner
       image: <internal docker registry>/sles-gha:31
       command: ["/home/runner/run.sh"]
       env:
         - name: DOCKER_HOST
           value: unix:///run/docker/docker.sock
       volumeMounts:
         - name: work
           mountPath: /home/runner/_work
         - name: dind-sock
           mountPath: /run/docker
           readOnly: true
     - name: dind
       image: docker:dind
       args:
         - dockerd
         - --host=unix:///run/docker/docker.sock
         - --group=$(DOCKER_GROUP_GID)
       env:
         - name: DOCKER_GROUP_GID
           value: "123"
       securityContext:
         privileged: true
       volumeMounts:
         - name: work
           mountPath: /home/runner/_work
         - name: dind-sock
           mountPath: /run/docker
         - name: dind-externals
           mountPath: /home/runner/externals
         - name: daemon-json
           mountPath: /etc/docker/daemon.json
           subPath: daemon.json
           readOnly: true
     volumes:
     - name: work
       emptyDir: {}
     - name: dind-sock
       emptyDir: {}
     - name: dind-externals
       emptyDir: {}
     - name: daemon-json
       configMap:
        name: docker-daemon-config
     imagePullSecrets:

Here's how it is deployed:

#!/bin/bash

INSTALLATION_NAME="arc-sles-test"
NAMESPACE="arc-runners-2"
GITHUB_CONFIG_URL="https://github.com/Teradata-PE-Stage"
helm install -f arc-values-sles-test.yaml "${INSTALLATION_NAME}" \
    --namespace "${NAMESPACE}" \
    --create-namespace \
    --set githubConfigUrl="${GITHUB_CONFIG_URL}" \
    --set githubConfigSecret="controller-manager" \
    oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set \
    --version 0.7.0


### Describe the bug

The runner rarely connects to GitHub, and it quickly starts consuming a lot of resources from the cluster. It appears to get stuck in a cycle of infinitely respawning itself. Here is the output of `kubectl top`:

NAME CPU(cores) MEMORY(bytes)
arc-sles-test-5drbg-runner-n8ddn 7776m 39070Mi


The issue is that the runner startup script has a `wait -f` and the version of `wait` that is shipped with SLES does not support the `-f` flag. This is causing the script to fail, and to be respawned. 

### Describe the expected behavior

The documentation says that SLES12+ is supported, so runners should consistently connect using SLES15SP4.

Also, the script should not be infinitely respawning and consuming all of the cluster's resources. 

### Additional Context

```yaml
Removing the `-f` flag from the runner script during our docker build seems to workaround the issue. So far this has been successful, but I don't know if there may be circumstances which this could cause issues.

In the output of the runner log, you can see:

/home/runner/run.sh: line 41: wait: -f: invalid option
wait: usage: wait [-n] [id ...]

Also note that I've used the latest version of the controller and the same issue exists, since this is a bug in the runner script specifically.



### Controller Logs

```shell
https://gist.github.com/greg-teradata/9a88c8a2b85f78db0408e3eaf6c17354

Runner Pod Logs

https://gist.github.com/greg-teradata/da756d0576720eeb6a8c8a16ce53166f
@greg-teradata greg-teradata added the bug Something isn't working label Feb 29, 2024
Copy link
Contributor

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

@nikola-jokic nikola-jokic transferred this issue from actions/actions-runner-controller Mar 1, 2024
@nikola-jokic nikola-jokic self-assigned this Mar 1, 2024
@nikola-jokic nikola-jokic added docker Pull requests that update Docker code Runner Bug Bug fix scope to the runner labels Mar 1, 2024
@nikola-jokic
Copy link
Contributor

Thank you for submitting this issue! I transferred it to the runner repo and created a PR fixing it ☺️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docker Pull requests that update Docker code Runner Bug Bug fix scope to the runner
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants