-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self hosted runner in Linux container exits job prematurely #921
Comments
@kabamawutschnik have you check You can also manually modify the |
@TingluoHuang I had to install rsyslog and start it (with a default conf), to start getting syslog. After doing so, I ran the workflow again and ran your grep command. I did not find any results. I set the oom_score_adj for the github runner service, to -999. Re running the workflow again, resulted in the same job cancellation. I forgot to note: The exact same workflow runs to completion in a Github hosted runner. |
@kabamawutschnik do you mind share the full runner diag logs for both runner.listener and runner.worker? The runner cancel jobs only when
|
I believe these are the ones. Worker_20210119-002009-utc.log |
@kabamawutschnik something sends a Ctrl-C to the runner while it's running. |
@TingluoHuang Strange.. The exact same workflow runs fully on a github hosted runner. Are github hosted runners some how more resilient to kill commands? Searching through setup.sh I found a location where we run |
I did some testing. I commented out a bunch of Could this be a result of child process management? |
@kabamawutschnik try to figure out what process sends the SIGINT? |
Attached to process 655 (worker process), & tracing all the child processes of 655. Looks like the worker sends SIGINT then SIGTERM. Note: I notice that our github runner service dies after the exited job. I need to restart it with 2671 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2676, si_uid=1000, si_status=0, si_utime=0, si_stime=0} --- ......... 2716 +++ exited with 0 +++ |
Starting the runner as a foreground process, with run.sh, allowed the job to get past the issue. Not exactly sure why the service version is dying so easily. Perhaps a setting could be changed in the unit file? |
What's definitely confusing is that the job ends up marked as cancelled in GitHub Actions, even though runsvc is notified.
Should've been failure when a job is not canceled from GitHub Actions, but just killed/terminated on the server, wouldn't you agree? At lest to me job status is reported from the perspective of GH Actions, not from the underlying runners. |
Describe the bug
We are attempting to test our nodejs application on a self hosted runner in a linux container. Job is exiting (github runner is cancelling the job) prematurely, during a memory intensive portion. The step that we are exiting consistently in, is the setup bash script to install node, node modules, and mongodb.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
To complete without exiting early.
Runner Version and Platform
2.275.1 - Ubuntu Container
What's not working?
Processes are killed, because oom_score_adj is not able to be edited by github runner.
Job Log Output
Setup Step
Error: The operation was canceled. [debug]System.OperationCanceledException: The operation was canceled.
Runner and Worker's Diagnostic Logs
Worker Logs
Exiting Exception
Caught cancellation exception from step: System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken) at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary
2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, CancellationToken cancellationToken)at GitHub.Runner.Worker.Handlers.ScriptHandler.RunAsync(ActionRunStage stage)
at GitHub.Runner.Worker.ActionRunner.RunAsync()
at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)`
A bunch of these errors, but this is the last one (during the bash setup.sh script)
Which: 'bash' [2021-01-19 00:21:54Z INFO ScriptHandler] Location: '/usr/bin/bash' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Starting process: [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] File name: '/usr/bin/bash' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Arguments: '-e /app/project/_work/_temp/4a28742a-a48e-4b22-9af0-403af0e37076.sh' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Working directory: '/app/project/_work/project/project' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Require exit code zero: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Encoding web name: ; code page: '' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Force kill process on cancellation: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Redirected STDIN: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Persist current code page: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Keep redirected STDIN open: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] High priority process: 'False' [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] Failed to update oom_score_adj for PID: 2506. [2021-01-19 00:21:54Z INFO ProcessInvokerWrapper] System.UnauthorizedAccessException: Access to the path '/proc/2506/oom_score_adj' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at System.IO.FileStream.WriteNative(ReadOnlySpan
1 source)at System.IO.FileStream.FlushWriteBuffer()
at System.IO.FileStream.Dispose(Boolean disposing)
at System.IO.Stream.Close()
at System.IO.StreamWriter.CloseStreamFromDispose(Boolean disposing)
at System.IO.StreamWriter.Dispose(Boolean disposing)
at System.IO.TextWriter.Dispose()
at System.IO.File.WriteAllText(String path, String contents)
at GitHub.Runner.Sdk.ProcessInvoker.WriteProcessOomScoreAdj(Int32 processId, Int32 oomScoreAdj)`
Example Container Dockerfile
`FROM ubuntu:latest
ARG DEBIAN_FRONTEND=noninteractive
ENV GITHUB_RUNNER_VERSION="2.275.1"
ENV RUNNER_NAME "runner"
ENV GITHUB_OWNER "owner"
ENV RUNNER_WORKDIR "_work"
ENV GITHUB_REPOSITORY ""
ENV GITHUB_PAT ""
RUN apt-get update -y
&& apt-get upgrade -y
&& apt-get install -y
curl
sudo
git
jq
systemctl
tzdata
mysql-client
python3-pip
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
&& useradd -m github
&& usermod -aG sudo github
&& echo "%sudo ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
&& touch /etc/sudoers.d/github
&& echo "github ALL = (ALL) NOPASSWD: ALL" >> /etc/sudoers.d/github
USER github
WORKDIR /app
COPY --chown=github:github entrypoint.sh ./entrypoint.sh
RUN sudo chown -R github /app
RUN sudo chmod u+x ./entrypoint.sh
ENTRYPOINT [ "/app/entrypoint.sh" ]`
Example entrypoint.sh
`#!/bin/sh
mkdir -p /app/${GITHUB_REPOSITORY}
cd /app/${GITHUB_REPOSITORY}
curl -Ls https://github.com/actions/runner/releases/download/v${GITHUB_RUNNER_VERSION}/actions-runner-linux-x64-${GITHUB_RUNNER_VERSION}.tar.gz | tar xz
&& sudo ./bin/installdependencies.sh
token_url="https://api.github.com/repos/${GITHUB_OWNER}/${GITHUB_REPOSITORY}/actions/runners/registration-token"
registration_url="https://github.com/${GITHUB_OWNER}/${GITHUB_REPOSITORY}"
echo "Requesting token at '${token_url}'"
payload=$(curl -sX POST -H "Authorization: token ${GITHUB_PAT}" ${token_url})
export RUNNER_TOKEN=$(echo "$payload" | jq .token --raw-output)
./config.sh
--name "${GITHUB_REPOSITORY}"
--token "${RUNNER_TOKEN}"
--url "${registration_url}"
--works "${RUNNER_WORKDIR}"
--labels "${RUNNER_LABELS}"
--unattended
--replace
echo "adding actions service"
sudo ./svc.sh install
echo "starting all services with actions."
sudo systemctl start 'actions.'
tail -f /dev/null
`
Example workflow.yml
`name: nodejs test
on:
push:
branches:
- dev
jobs:
Analysis:
runs-on: self-hosted
`
Notes:
The text was updated successfully, but these errors were encountered: