Pull-based autoscaling support #1588

axel3rd · 2022-01-04T17:22:10Z

According GitHub Recommended autoscaling solutions, this solution doesn't support Pull-based autoscaling feature.

I'm not sure to fully understand what is this feature, but it could be nice to support it 😁 (as the other major solution).

(No many historic found from af32d).

npalm · 2022-01-05T12:03:24Z

It seems for pull based support a list of repositories is configured which are queried for pending jobs to decide to scale. We have chosen essentialy to scale based on events. For our own deployment we must handle 1000+ repositories. So by querying contuous 1000+ repos will hit rate limits. But we are of course open to explore how pull based can be supported if needed.

toast-gear · 2022-01-05T14:53:48Z

Pull based scaling is where the controller scales runners based on a metric every poll period (sync period), the 2 current metrics are:

the queue depth of all workflow runs against a defined list of repositories (intensive API call metric)
the number of busy runners (light API call metric)

It's useful for people that:

can't or don't want to use webhooks
have slow / low scaling requirements / want something simple
want a high degree of control over runner allocation server-side and are happy with overhead of managing named repositories server-side
run a GHES environment where you can control your rate limit budget, including disabling rate limiting entirely

npalm · 2022-01-05T19:59:54Z

@toast-gear Thanks for the clarification! I worked last week on a PR to add a so called "simple pool" that checks every x interval the number of idle runners wanted, and once not meeting the setting scales up with the required number. See #1577

npalm · 2022-01-06T18:53:19Z

In PR #1577 we add a simple way for pull based scaling. This change let you define a pool based on a cron expression and a desired pool size. Based on the cron expression lambda is triggered which will update the pool to the required size. Configuration will be provided as a list, so multiple combination of cron expression and pool sizes can be defined. For example to support a different poolsize on weekdays and weekends.

toast-gear · 2022-01-06T19:10:31Z

@npalm Sounds very interesting and similar-ish to some of the stuff in actions-runner-controller (ARC)! You can setup a schedule to override the min and / or max replica count on a per runner set basis. It's helpful if you just want a basic scale up during core businesss hours to a set amount and scale down outside of core business hours to a set amount setup. With ARC being k8s based it's also helpful for cost optimisation too as you may want to scale your runner node group/s down to 0 outside of core business hours to save £££.

The pull based scaling stuff is more centred around continously scaling up and down driven from some environmental metric/s and reassessed every poll period.

axel3rd · 2022-01-06T19:14:07Z

In PR #1577 we add a simple way for pull based scaling

So can we consider that Pull-based autoscaling feature is(/will) supported ?
If "yes", I will change PR github/docs#13742 by "yes (org-level runners)" and stage until merged.

toast-gear · 2022-01-06T19:17:05Z

In terms of as a comparison to ARC it sounds more like schedules but it's sort of close enough right? It's just an informal term really so @npalm's work sounds like it's close enough that it could be considered ticked off feature wise once released? Up to @npalm really though.

npalm · 2022-01-06T20:29:03Z

In terms of as a comparison to ARC it sounds more like schedules but it's sort of close enough right? It's just an informal term really so @npalm's work sounds like it's close enough that it could be considered ticked off feature wise once released? Up to @npalm really though.

@toast-gear you are right. The trigger is scheduled. The lambda is checking for the number of active runners before scaling. Do you have any other suggestion that fits in our approach?

axel3rd · 2022-01-06T21:32:20Z

(PR github/docs#13742 updated, will be un-draft when #1577 merged)

toast-gear · 2022-01-07T11:50:04Z

I guess knowing a bit of the history would help.

Originally actions-runner-controller only had a single scaling metric, the TotalNumberOfQueuedAndInProgressWorkflowRuns pull based metric. Subsequent to that the PercentageRunnersBusy pull based metric was added and after that the webhook server was added introducing support for webhooks. As a result we needed a way of differentiating between the 2 scaling options as they were funamentally different, the former were built around a poll period, the latter an event. Pull based scaling was chosen for the former as the scaling is driven from environmental details discovered by the controller and webhook based scaling for the latter as the scaling is driven from an event provided by GitHub.

I'd say the key detail which is what makes both pull based and webhook based scaling pull based / webhook based scaling is the scaling is based on some environmental metric and scaling will scale up and down (within the limits of the config) as it is informed from the environment each poll / event e.g. queue depth, how busy runners are or an event. For me, scheduled scaling is a different feature as it isn't really responding to an environmental metric, it has an arbrtiary runner count as defined by the schedule and will keep the count at that level regardless of the environment. In the case of ARC, you can even combine scheduled scaling with pull or webhook driven scaling so it really is its own feature in the ARC project at least.

So if we want to stay true to the pull based scaling term (which is a fairly informal term so it's not the end of the world) then the docs probably need a new column Scheduled scaling and with #1577 merged philips-labs/terraform-aws-github-runner can be said to support this feature. That said I'm not spent the time to go into detail on your new feature so if you feel it fulfills the concept of pull based scaling well enough (either by the characteristics I've suggested or just in your own way) then feel free to make that call.

axel3rd · 2022-01-07T13:59:09Z

❤️ pernickety discussions 😁

So perhaps updating the feature name could be more accurate, proposal:

Features	actions-runner-controller	terraform-aws-github-runner
How runners can be scaled	Webhook events, Scheduled, Pull-based	Webhook events, Scheduled (org-level runners only)

axel3rd · 2022-01-12T15:28:15Z

Can be closed with #1577.
GitHub doc following in github/docs#13742.

axel3rd mentioned this issue Jan 5, 2022

Self-hosted runners Pull-based autoscaling precision github/docs#13741

Closed

1 task

axel3rd mentioned this issue Jan 6, 2022

Fix #13741 : Self hosted Pull-based scale update github/docs#13742

Merged

5 tasks

axel3rd closed this as completed Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull-based autoscaling support #1588

Pull-based autoscaling support #1588

axel3rd commented Jan 4, 2022

npalm commented Jan 5, 2022

toast-gear commented Jan 5, 2022 •

edited

Loading

npalm commented Jan 5, 2022

npalm commented Jan 6, 2022

toast-gear commented Jan 6, 2022 •

edited

Loading

axel3rd commented Jan 6, 2022

toast-gear commented Jan 6, 2022 •

edited

Loading

npalm commented Jan 6, 2022

axel3rd commented Jan 6, 2022

toast-gear commented Jan 7, 2022 •

edited

Loading

axel3rd commented Jan 7, 2022

axel3rd commented Jan 12, 2022

Pull-based autoscaling support #1588

Pull-based autoscaling support #1588

Comments

axel3rd commented Jan 4, 2022

npalm commented Jan 5, 2022

toast-gear commented Jan 5, 2022 • edited Loading

npalm commented Jan 5, 2022

npalm commented Jan 6, 2022

toast-gear commented Jan 6, 2022 • edited Loading

axel3rd commented Jan 6, 2022

toast-gear commented Jan 6, 2022 • edited Loading

npalm commented Jan 6, 2022

axel3rd commented Jan 6, 2022

toast-gear commented Jan 7, 2022 • edited Loading

axel3rd commented Jan 7, 2022

axel3rd commented Jan 12, 2022

toast-gear commented Jan 5, 2022 •

edited

Loading

toast-gear commented Jan 6, 2022 •

edited

Loading

toast-gear commented Jan 6, 2022 •

edited

Loading

toast-gear commented Jan 7, 2022 •

edited

Loading