Feature request: provide a signal to other programs when the runner has started a job #699

j3parker · 2020-09-10T14:00:13Z

It would be helpful if the runner could signal that it has picked up a job to other programs.

We intend to use ephemeral runners (#510). We want to have an auto-scale-group in AWS that holds spare/idle capacity. When a job starts we want to remove the runner from that ASG so that it gets replaced in parallel with the job. The alternative is to scale up only after the job ends (and the machine terminates) but this adds latency, and if we get a burst of jobs (that uses up all of our capacity) we would get excessive queueing (and if the jobs are long-running it could get really bad.)

(You don't care about these details) when we detect a job has started we trigger a Lambda function that looks at the caller identity and removes the caller from its ASG (we do it this way so that the runner VM has very narrow AWS permissions).

How we've worked around this: with --once (which we tried using before noticing #510) we were looking at the runner stdout for the "Job started" message. This isn't the classiest thing to do (but it at least gracefully degrades -- if the message were to change then we would just have more build queueing). Any design you come up with would be fine for us.

The text was updated successfully, but these errors were encountered:

j3parker · 2020-09-10T14:16:03Z

This would also make scaling-in easier: if our ASG is only holding spare/idle capacity then it is safe to stop those VMs. If our ASG holds both idle and active runners then scaling-in is more complicated (we can't just let the ASG pick a random machine -- but we don't know which ones it should pick).

There is a small race-condition here: the runner may have picked up a job but we haven't detatched it from the ASG... but this is something we could probably live with easily (we're talking like < 1 second delay here). And there are ways we could solve that too.

bryanmacfarlane · 2020-09-10T14:25:47Z

In the meantime one easy way to get a signal is the runner has two processes. One that listens to the queue (long running) and one that spawns a worker process (one per job) so you could ideally hook into process start / stop and at worst poll (ps aux | grep ...)

j3parker · 2020-09-10T15:11:37Z

Ok, thanks! Maybe we'll combine that with our stdout spying for extra assurance (the "remove from ASG" operation is idempotent -- so it'd be safe for us to do both.)

I'm ok with temporary solutions like that, and I'm ok with any permanent solution you would come up with :) Having a documented/supported approach would probably be good because this might be a somewhat common scenario (we're also considering open-sourcing our AWS solution, but don't want to if we're doing weird hacks.)

thboop · 2022-03-14T15:56:42Z

We recently published an ADR for Job Started / Job Completed hooks for self hosted runners, feel free to provide your feedback.

In particular we would love to hear what (if anything ) else you would need to support your use case, and if the interface makes sense for you.

thboop · 2022-03-30T15:50:39Z

We've shipped a beta of this functionality in 2.289.1, please try it out and provide any feedback you have on the adr!

j3parker added the enhancement New feature or request label Sep 10, 2020

j3parker mentioned this issue Sep 28, 2020

Question: is it possible to see the amount of runs in a queue, so I could scale the amount of virtual machines? #723

Closed

j3parker mentioned this issue Mar 19, 2021

Support for autoscaling self-hosted github runners #845

Open

Natanande mentioned this issue Jul 8, 2021

Draining nodes without interrupting busy runners actions/actions-runner-controller#643

Closed

tyrken mentioned this issue Nov 8, 2021

Add RUNNER_HOOK_PREEXECUTE env-var to call a custom script at the very start of a workflow #1469

Closed

tuomoa mentioned this issue Feb 24, 2022

Random Operation Cancelled Runner Decommissions actions/actions-runner-controller#911

Closed

thboop mentioned this issue Mar 14, 2022

[self-hosted] A hook that is fired before job starts #1543

Closed

thboop closed this as completed Mar 30, 2022

lovesegfault mentioned this issue Apr 14, 2022

Allow to update runner pod annotations on job start actions/actions-runner-controller#1335

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: provide a signal to other programs when the runner has started a job #699

Feature request: provide a signal to other programs when the runner has started a job #699

j3parker commented Sep 10, 2020 •

edited

Loading

j3parker commented Sep 10, 2020 •

edited

Loading

bryanmacfarlane commented Sep 10, 2020 •

edited

Loading

j3parker commented Sep 10, 2020 •

edited

Loading

thboop commented Mar 14, 2022

thboop commented Mar 30, 2022

Feature request: provide a signal to other programs when the runner has started a job #699

Feature request: provide a signal to other programs when the runner has started a job #699

Comments

j3parker commented Sep 10, 2020 • edited Loading

j3parker commented Sep 10, 2020 • edited Loading

bryanmacfarlane commented Sep 10, 2020 • edited Loading

j3parker commented Sep 10, 2020 • edited Loading

thboop commented Mar 14, 2022

thboop commented Mar 30, 2022

j3parker commented Sep 10, 2020 •

edited

Loading

j3parker commented Sep 10, 2020 •

edited

Loading

bryanmacfarlane commented Sep 10, 2020 •

edited

Loading

j3parker commented Sep 10, 2020 •

edited

Loading