-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: provide a signal to other programs when the runner has started a job #699
Comments
This would also make scaling-in easier: if our ASG is only holding spare/idle capacity then it is safe to stop those VMs. If our ASG holds both idle and active runners then scaling-in is more complicated (we can't just let the ASG pick a random machine -- but we don't know which ones it should pick). There is a small race-condition here: the runner may have picked up a job but we haven't detatched it from the ASG... but this is something we could probably live with easily (we're talking like < 1 second delay here). And there are ways we could solve that too. |
In the meantime one easy way to get a signal is the runner has two processes. One that listens to the queue (long running) and one that spawns a worker process (one per job) so you could ideally hook into process start / stop and at worst poll (ps aux | grep ...) |
Ok, thanks! Maybe we'll combine that with our stdout spying for extra assurance (the "remove from ASG" operation is idempotent -- so it'd be safe for us to do both.) I'm ok with temporary solutions like that, and I'm ok with any permanent solution you would come up with :) Having a documented/supported approach would probably be good because this might be a somewhat common scenario (we're also considering open-sourcing our AWS solution, but don't want to if we're doing weird hacks.) |
We recently published an ADR for Job Started / Job Completed hooks for self hosted runners, feel free to provide your feedback. In particular we would love to hear what (if anything ) else you would need to support your use case, and if the interface makes sense for you. |
We've shipped a beta of this functionality in |
It would be helpful if the runner could signal that it has picked up a job to other programs.
We intend to use ephemeral runners (#510). We want to have an auto-scale-group in AWS that holds spare/idle capacity. When a job starts we want to remove the runner from that ASG so that it gets replaced in parallel with the job. The alternative is to scale up only after the job ends (and the machine terminates) but this adds latency, and if we get a burst of jobs (that uses up all of our capacity) we would get excessive queueing (and if the jobs are long-running it could get really bad.)
(You don't care about these details) when we detect a job has started we trigger a Lambda function that looks at the caller identity and removes the caller from its ASG (we do it this way so that the runner VM has very narrow AWS permissions).
How we've worked around this: with
--once
(which we tried using before noticing #510) we were looking at the runner stdout for the "Job started" message. This isn't the classiest thing to do (but it at least gracefully degrades -- if the message were to change then we would just have more build queueing). Any design you come up with would be fine for us.The text was updated successfully, but these errors were encountered: