Skip to content

Commit

Permalink
feat: Replace run instance API by create fleet API (#1556)
Browse files Browse the repository at this point in the history
* Replace creating instance by the fleet api

* Enable create fleet API

* Enable create fleet API

* refactor

* refactor

* fix tests

* cleanup

* refactor

* refactor, prepare for creating muliple runners

* revert and typo

* revert and typo

* Adjust for ARM

* refactor

* Update README.md

Co-authored-by: Scott Guymer <[email protected]>

* Update modules/runners/scale-up.tf

Co-authored-by: Scott Guymer <[email protected]>

* Update modules/runners/variables.tf

Co-authored-by: Scott Guymer <[email protected]>

* Update modules/runners/variables.tf

Co-authored-by: Scott Guymer <[email protected]>

* Update variables.tf

Co-authored-by: Scott Guymer <[email protected]>

* Update README.md

Co-authored-by: Scott Guymer <[email protected]>

* review

* typo

Co-authored-by: Scott Guymer <[email protected]>
  • Loading branch information
npalm and ScottGuymer authored Jan 5, 2022
1 parent 7907984 commit 27e974d
Show file tree
Hide file tree
Showing 31 changed files with 1,005 additions and 628 deletions.
21 changes: 9 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ To be able to support a number of use-cases the module has quite a lot configura
- Linux vs Windows. you can configure the os types linux and win. Linux will be used by default.
- Re-use vs Ephemeral. By default runners are re-used for till detected idle, once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners are only working in combination with the workflow job event. We also suggest to use a pre-build AMI to improve the start time of jobs.
- GitHub cloud vs GitHub enterprise server (GHES). The runner support GitHub cloud as well GitHub enterprise service. For GHES we rely on our community to test and support. We have no possibility to test ourselves on GHES.
- Spot vs on-demand. The runners using either the EC2 spot or on-demand life cycle. Runners will be created via the AWS [CreateFleet API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_CreateFleet.html). The module (scale up lambda) will request an instance via the create fleet API in one of the subnets and matching one of the specified instance types.


#### ARM64 support via Graviton/Graviton2 instance-types
Expand Down Expand Up @@ -325,15 +326,7 @@ The following sub modules are optional and are provided as example or utility:

### ARM64 configuration for submodules

When not using the top-level module and specifying an `a1`, `t4g` or `*6g*` (6th-gen Graviton2) `instance_type`, the `runner-binaries-syncer` and `runners` submodules need to be configured appropriately for pulling the ARM64 GitHub action runner binary and leveraging the arm64 AMI for the runners.

When configuring `runner-binaries-syncer`

- _runner_architecture_ - set to `arm64`, defaults to `x64`

When configuring `runners`

- _ami_filter_ - set to `["amzn2-ami-hvm-2*-arm64-gp2"]`, defaults to `["amzn2-ami-hvm-2.*-x86_64-ebs"]`
When using the top-level module configure `runner_architecture = arm64` and ensure the list of `instance_types` matches. When not using the top-level ensure the bot properties are set on the submodules.

## Debugging

Expand Down Expand Up @@ -401,9 +394,12 @@ In case the setup does not work as intended follow the trace of events:
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br> key_base64 = string<br> id = string<br> webhook_secret = string<br> })</pre> | n/a | yes |
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br> cron = string<br> timeZone = string<br> idleCount = number<br> }))</pre> | `[]` | no |
| <a name="input_instance_allocation_strategy"></a> [instance\_allocation\_strategy](#input\_instance\_allocation\_strategy) | The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`. | `string` | `"lowest-price"` | no |
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |
| <a name="input_instance_type"></a> [instance\_type](#input\_instance\_type) | [DEPRECATED] See instance\_types. | `string` | `"m5.large"` | no |
| <a name="input_instance_types"></a> [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (amzn2 for linux and Windows Server Core for win). | `list(string)` | `null` | no |
| <a name="input_instance_target_capacity_type"></a> [instance\_target\_capacity\_type](#input\_instance\_target\_capacity\_type) | Default lifecycle used for runner instances, can be either `spot` or `on-demand`. | `string` | `"spot"` | no |
| <a name="input_instance_type"></a> [instance\_type](#input\_instance\_type) | [DEPRECATED] See instance\_types. | `string` | `null` | no |
| <a name="input_instance_types"></a> [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (amzn2 for linux and Windows Server Core for win). | `list(string)` | <pre>[<br> "m5.large",<br> "c5.large"<br>]</pre> | no |
| <a name="input_job_queue_retention_in_seconds"></a> [job\_queue\_retention\_in\_seconds](#input\_job\_queue\_retention\_in\_seconds) | The number of seconds the job is held in the queue before it is purged | `number` | `86400` | no |
| <a name="input_key_name"></a> [key\_name](#input\_key\_name) | Key pair name | `string` | `null` | no |
| <a name="input_kms_key_arn"></a> [kms\_key\_arn](#input\_kms\_key\_arn) | Optional CMK Key ARN to be used for Parameter Store. This key must be in the current account. | `string` | `null` | no |
Expand All @@ -414,14 +410,15 @@ In case the setup does not work as intended follow the trace of events:
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | `string` | `"info"` | no |
| <a name="input_log_type"></a> [log\_type](#input\_log\_type) | Logging format for lambda logging. Valid values are 'json', 'pretty', 'hidden'. | `string` | `"pretty"` | no |
| <a name="input_logging_retention_in_days"></a> [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no |
| <a name="input_market_options"></a> [market\_options](#input\_market\_options) | Market options for the action runner instances. Setting the value to `null` let the scaler create on-demand instances instead of spot instances. | `string` | `"spot"` | no |
| <a name="input_market_options"></a> [market\_options](#input\_market\_options) | DEPCRECATED: Replaced by `instance_target_capacity_type`. | `string` | `null` | no |
| <a name="input_minimum_running_time_in_minutes"></a> [minimum\_running\_time\_in\_minutes](#input\_minimum\_running\_time\_in\_minutes) | The time an ec2 action runner should be running at minimum before terminated if not busy. | `number` | `null` | no |
| <a name="input_redrive_build_queue"></a> [redrive\_build\_queue](#input\_redrive\_build\_queue) | Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting, `enalbed' to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries.` | <pre>object({<br> enabled = bool<br> maxReceiveCount = number<br> })</pre> | <pre>{<br> "enabled": false,<br> "maxReceiveCount": null<br>}</pre> | no |
| <a name="input_repository_white_list"></a> [repository\_white\_list](#input\_repository\_white\_list) | List of repositories allowed to use the github app | `list(string)` | `[]` | no |
| <a name="input_role_path"></a> [role\_path](#input\_role\_path) | The path that will be added to role path for created roles, if not set the environment name will be used. | `string` | `null` | no |
| <a name="input_role_permissions_boundary"></a> [role\_permissions\_boundary](#input\_role\_permissions\_boundary) | Permissions boundary that will be added to the created roles. | `string` | `null` | no |
| <a name="input_runner_additional_security_group_ids"></a> [runner\_additional\_security\_group\_ids](#input\_runner\_additional\_security\_group\_ids) | (optional) List of additional security groups IDs to apply to the runner | `list(string)` | `[]` | no |
| <a name="input_runner_allow_prerelease_binaries"></a> [runner\_allow\_prerelease\_binaries](#input\_runner\_allow\_prerelease\_binaries) | Allow the runners to update to prerelease binaries. | `bool` | `false` | no |
| <a name="input_runner_architecture"></a> [runner\_architecture](#input\_runner\_architecture) | The platform architecture of the runner instance\_type. | `string` | `"x64"` | no |
| <a name="input_runner_as_root"></a> [runner\_as\_root](#input\_runner\_as\_root) | Run the action runner under the root user. Variable `runner_run_as` will be ingored. | `bool` | `false` | no |
| <a name="input_runner_binaries_s3_sse_configuration"></a> [runner\_binaries\_s3\_sse\_configuration](#input\_runner\_binaries\_s3\_sse\_configuration) | Map containing server-side encryption configuration for runner-binaries S3 bucket. | `any` | `{}` | no |
| <a name="input_runner_binaries_syncer_lambda_timeout"></a> [runner\_binaries\_syncer\_lambda\_timeout](#input\_runner\_binaries\_syncer\_lambda\_timeout) | Time out of the binaries sync lambda in seconds. | `number` | `300` | no |
Expand Down
4 changes: 2 additions & 2 deletions examples/ephemeral/main.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
locals {
environment = "ephemeraal"
environment = "ephemeral"
aws_region = "eu-west-1"
}

Expand Down Expand Up @@ -60,7 +60,7 @@ module "runners" {
# ami_owners = [data.aws_caller_identity.current.account_id]

# Enable logging
# log_level = "debug"
# log_level = "debug"

# Setup a dead letter queue, by default scale up lambda will kepp retrying to process event in case of scaling error.
# redrive_policy_build_queue = {
Expand Down
16 changes: 8 additions & 8 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ locals {
})

s3_action_runner_url = "s3://${module.runner_binaries.bucket.id}/${module.runner_binaries.runner_distribution_object_key}"
runner_architecture = substr(var.instance_type, 0, 2) == "a1" || substr(var.instance_type, 0, 3) == "t4g" || substr(var.instance_type, 1, 2) == "6g" ? "arm64" : "x64"
github_app_parameters = {
id = module.ssm.parameters.github_app_id
key_base64 = module.ssm.parameters.github_app_key_base64
Expand Down Expand Up @@ -91,13 +90,14 @@ module "runners" {
s3_bucket_runner_binaries = module.runner_binaries.bucket
s3_location_runner_binaries = local.s3_action_runner_url

runner_os = var.runner_os
instance_type = var.instance_type
instance_types = var.instance_types
market_options = var.market_options
block_device_mappings = var.block_device_mappings
runner_os = var.runner_os
instance_types = var.instance_types
instance_target_capacity_type = var.instance_target_capacity_type
instance_allocation_strategy = var.instance_allocation_strategy
instance_max_spot_price = var.instance_max_spot_price
block_device_mappings = var.block_device_mappings

runner_architecture = local.runner_architecture
runner_architecture = var.runner_architecture
ami_filter = var.ami_filter
ami_owners = var.ami_owners

Expand Down Expand Up @@ -169,7 +169,7 @@ module "runner_binaries" {
distribution_bucket_name = "${var.environment}-dist-${random_string.random.result}"

runner_os = var.runner_os
runner_architecture = local.runner_architecture
runner_architecture = var.runner_architecture
runner_allow_prerelease_binaries = var.runner_allow_prerelease_binaries

lambda_s3_bucket = var.lambda_s3_bucket
Expand Down
2 changes: 1 addition & 1 deletion modules/runner-binaries-syncer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ No modules.
| <a name="input_role_path"></a> [role\_path](#input\_role\_path) | The path that will be added to the role, if not set the environment name will be used. | `string` | `null` | no |
| <a name="input_role_permissions_boundary"></a> [role\_permissions\_boundary](#input\_role\_permissions\_boundary) | Permissions boundary that will be added to the created role for the lambda. | `string` | `null` | no |
| <a name="input_runner_allow_prerelease_binaries"></a> [runner\_allow\_prerelease\_binaries](#input\_runner\_allow\_prerelease\_binaries) | Allow the runners to update to prerelease binaries. | `bool` | `false` | no |
| <a name="input_runner_architecture"></a> [runner\_architecture](#input\_runner\_architecture) | The platform architecture for the runner instance (x64, arm64), defaults to 'x64' | `string` | `"x64"` | no |
| <a name="input_runner_architecture"></a> [runner\_architecture](#input\_runner\_architecture) | The platform architecture of the runner instance\_type. | `string` | `"x64"` | no |
| <a name="input_runner_os"></a> [runner\_os](#input\_runner\_os) | The operating system for the runner instance (linux, win), defaults to 'linux' | `string` | `"linux"` | no |
| <a name="input_server_side_encryption_configuration"></a> [server\_side\_encryption\_configuration](#input\_server\_side\_encryption\_configuration) | Map containing server-side encryption configuration. | `any` | `{}` | no |
| <a name="input_syncer_lambda_s3_key"></a> [syncer\_lambda\_s3\_key](#input\_syncer\_lambda\_s3\_key) | S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas. | `any` | `null` | no |
Expand Down
9 changes: 8 additions & 1 deletion modules/runner-binaries-syncer/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,16 @@ variable "runner_os" {
}

variable "runner_architecture" {
description = "The platform architecture for the runner instance (x64, arm64), defaults to 'x64'"
description = "The platform architecture of the runner instance_type."
type = string
default = "x64"
validation {
condition = anytrue([
var.runner_architecture == "x64",
var.runner_architecture == "arm64",
])
error_message = "`runner_architecture` value not valid, valid values are: `x64` and `arm64`."
}
}

variable "logging_retention_in_days" {
Expand Down
5 changes: 4 additions & 1 deletion modules/runners/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,10 @@ No modules.
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
| <a name="input_github_app_parameters"></a> [github\_app\_parameters](#input\_github\_app\_parameters) | Parameter Store for GitHub App Parameters. | <pre>object({<br> key_base64 = map(string)<br> id = map(string)<br> })</pre> | n/a | yes |
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br> cron = string<br> timeZone = string<br> idleCount = number<br> }))</pre> | `[]` | no |
| <a name="input_instance_allocation_strategy"></a> [instance\_allocation\_strategy](#input\_instance\_allocation\_strategy) | The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`. | `string` | `"lowest-price"` | no |
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |
| <a name="input_instance_target_capacity_type"></a> [instance\_target\_capacity\_type](#input\_instance\_target\_capacity\_type) | Default lifecyle used runner instances, can be either `spot` or `on-demand`. | `string` | `"spot"` | no |
| <a name="input_instance_type"></a> [instance\_type](#input\_instance\_type) | [DEPRECATED] See instance\_types. | `string` | `"m5.large"` | no |
| <a name="input_instance_types"></a> [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (amzn2 for linux and Windows Server Core for win). | `list(string)` | `null` | no |
| <a name="input_key_name"></a> [key\_name](#input\_key\_name) | Key pair name | `string` | `null` | no |
Expand All @@ -142,7 +145,7 @@ No modules.
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | `string` | `"info"` | no |
| <a name="input_log_type"></a> [log\_type](#input\_log\_type) | Logging format for lambda logging. Valid values are 'json', 'pretty', 'hidden'. | `string` | `"pretty"` | no |
| <a name="input_logging_retention_in_days"></a> [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no |
| <a name="input_market_options"></a> [market\_options](#input\_market\_options) | Market options for the action runner instances. | `string` | `"spot"` | no |
| <a name="input_market_options"></a> [market\_options](#input\_market\_options) | DEPCRECATED: Replaced by `instance_target_capacity_type`. | `string` | `null` | no |
| <a name="input_metadata_options"></a> [metadata\_options](#input\_metadata\_options) | Metadata options for the ec2 runner instances. | `map(any)` | <pre>{<br> "http_endpoint": "enabled",<br> "http_put_response_hop_limit": 1,<br> "http_tokens": "optional"<br>}</pre> | no |
| <a name="input_minimum_running_time_in_minutes"></a> [minimum\_running\_time\_in\_minutes](#input\_minimum\_running\_time\_in\_minutes) | The time an ec2 action runner should be running at minimum before terminated if non busy. If not set the default is calculated based on the OS. | `number` | `null` | no |
| <a name="input_overrides"></a> [overrides](#input\_overrides) | This map provides the possibility to override some defaults. The following attributes are supported: `name_sg` overrides the `Name` tag for all security groups created by this module. `name_runner_agent_instance` overrides the `Name` tag for the ec2 instance defined in the auto launch configuration. `name_docker_machine_runners` overrides the `Name` tag spot instances created by the runner agent. | `map(string)` | <pre>{<br> "name_runner": "",<br> "name_sg": ""<br>}</pre> | no |
Expand Down
14 changes: 7 additions & 7 deletions modules/runners/lambdas/runners/jest.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ module.exports = {
preset: 'ts-jest',
testEnvironment: 'node',
collectCoverage: true,
collectCoverageFrom: ['src/**/*.{ts,js,jsx}','!src/**/*local*.ts'],
collectCoverageFrom: ['src/**/*.{ts,js,jsx}', '!src/**/*local*.ts', 'src/**/*.d.ts'],
coverageThreshold: {
global: {
branches: 80,
functions: 80,
lines: 80,
statements: 80
}
}
branches: 92,
functions: 92,
lines: 92,
statements: 92,
},
},
};
Loading

0 comments on commit 27e974d

Please sign in to comment.