Skip to content

Commit

Permalink
Add aws-node-termination-handler bundle (#966)
Browse files Browse the repository at this point in the history
Co-authored-by: paurosello <[email protected]>
  • Loading branch information
AndiDog and paurosello authored Dec 12, 2024
1 parent 53cb71b commit b907a77
Show file tree
Hide file tree
Showing 14 changed files with 209 additions and 0 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- Values: Add `global.providerSpecific.controlPlaneAmi` & `global.providerSpecific.nodePoolAmi`.
- Add aws-node-termination-handler bundle
- Make ASG lifecycle hook heartbeat timeout configurable

### Fixed

- Fix aws-nth-bundle to use the MC's kubeconfig context if it's in a different organization namespace.

Workload clusters outside the MC's `org-giantswarm` namespace failed to deploy the bundle because `HelmRelease` does not allow specifying the MC's kubeconfig secret namespace. The bundle was therefore switched to an `App`.

## [1.3.4] - 2024-10-15

Expand Down
10 changes: 10 additions & 0 deletions helm/cluster-aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ Configuration of apps that are part of the cluster.
| `global.apps.awsEbsCsiDriverServiceMonitors.extraConfigs[*].name` | **Name** - Name of the config map or secret. The object must exist in the same namespace as the cluster App.|**Type:** `string`<br/>|
| `global.apps.awsEbsCsiDriverServiceMonitors.extraConfigs[*].priority` | **Priority**|**Type:** `integer`<br/>**Default:** `25`|
| `global.apps.awsEbsCsiDriverServiceMonitors.values` | **Config map** - Helm Values to be passed to the app as user config.|**Type:** `object`<br/>|
| `global.apps.awsNodeTerminationHandler` | **App resource** - Configuration of a default app that is part of the cluster and is deployed as an App resource.|**Type:** `object`<br/>|
| `global.apps.awsNodeTerminationHandler.extraConfigs` | **Extra config maps or secrets** - Extra config maps or secrets that will be used to customize to the app. The desired values must be under configmap or secret key 'values'. The values are merged in the order given, with the later values overwriting earlier, and then inline values overwriting those. Resources must be in the same namespace as the cluster.|**Type:** `array`<br/>|
| `global.apps.awsNodeTerminationHandler.extraConfigs[*]` | **Config map or secret**|**Type:** `object`<br/>|
| `global.apps.awsNodeTerminationHandler.extraConfigs[*].kind` | **Kind** - Specifies whether the resource is a config map or a secret.|**Type:** `string`<br/>|
| `global.apps.awsNodeTerminationHandler.extraConfigs[*].name` | **Name** - Name of the config map or secret. The object must exist in the same namespace as the cluster App.|**Type:** `string`<br/>|
| `global.apps.awsNodeTerminationHandler.extraConfigs[*].priority` | **Priority**|**Type:** `integer`<br/>**Default:** `25`|
| `global.apps.awsNodeTerminationHandler.values` | **Config map** - Helm Values to be passed to the app as user config.|**Type:** `object`<br/>|
| `global.apps.awsPodIdentityWebhook` | **App resource** - Configuration of a default app that is part of the cluster and is deployed as an App resource.|**Type:** `object`<br/>|
| `global.apps.awsPodIdentityWebhook.extraConfigs` | **Extra config maps or secrets** - Extra config maps or secrets that will be used to customize to the app. The desired values must be under configmap or secret key 'values'. The values are merged in the order given, with the later values overwriting earlier, and then inline values overwriting those. Resources must be in the same namespace as the cluster.|**Type:** `array`<br/>|
| `global.apps.awsPodIdentityWebhook.extraConfigs[*]` | **Config map or secret**|**Type:** `object`<br/>|
Expand Down Expand Up @@ -356,6 +363,7 @@ For Giant Swarm internal use only, not stable, or not supported by UIs.

| **Property** | **Description** | **More Details** |
| :----------- | :-------------- | :--------------- |
| `internal.awsPartition` | **AWS Partition** - Only used when rendering the chart template locally, you shouldn't use this value.|**Type:** `string`<br/>|
| `internal.hashSalt` | **Hash salt** - If specified, this token is used as a salt to the hash suffix of some resource names. Can be used to force-recreate some resources.|**Type:** `string`<br/>|

### Kubectl image
Expand Down Expand Up @@ -390,6 +398,8 @@ Node pools of the cluster. If not specified, this defaults to the value of `clus
| `global.nodePools.PATTERN.additionalSecurityGroups[*].id` | **Id of the security group** - ID of the security group that will be added to the machine pool nodes. The security group must exist.|**Type:** `string`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
| `global.nodePools.PATTERN.availabilityZones` | **Availability zones**|**Type:** `array`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
| `global.nodePools.PATTERN.availabilityZones[*]` | **Availability zone**|**Type:** `string`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
| `global.nodePools.PATTERN.awsNodeTerminationHandler` | **aws-node-termination-handler related settings** - Configuration for the ASG lifecycle hook used by aws-node-termination-handler|**Type:** `object`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
| `global.nodePools.PATTERN.awsNodeTerminationHandler.heartbeatTimeoutSeconds` | **Heartbeat timeout for ASG lifecycle hook**|**Type:** `number`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>**Default:** `1800`|
| `global.nodePools.PATTERN.customNodeLabels` | **Custom node labels**|**Type:** `array`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
| `global.nodePools.PATTERN.customNodeLabels[*]` | **Label**|**Type:** `string`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
| `global.nodePools.PATTERN.customNodeTaints` | **Custom node taints**|**Type:** `array`<br/>**Key pattern:**<br/>`PATTERN`=`^[a-z0-9][-a-z0-9]{3,18}[a-z0-9]$`<br/>|
Expand Down
3 changes: 3 additions & 0 deletions helm/cluster-aws/ci/ci-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ global:
password: abcdef
- endpoint: quay.io

internal:
awsPartition: "aws"

cluster:
internal:
ephemeralConfiguration:
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
global:
release:
version: v27.0.0-alpha.1
metadata:
name: test-wc-minimal
organization: test
servicePriority: lowest
connectivity:
baseDomain: example.com
nodePools:
pool0:
maxSize: 2
minSize: 2
awsNodeTerminationHandler:
heartbeatTimeoutSeconds: 60
providerSpecific:
region: "eu-west-1"
managementCluster: test

cluster:
internal:
ephemeralConfiguration:
offlineTesting:
renderWithoutReleaseResource: true
3 changes: 3 additions & 0 deletions helm/cluster-aws/ci/test-local-registry-cache-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ global:
- docker.io
port: 32767

internal:
awsPartition: "aws"

cluster:
internal:
ephemeralConfiguration:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ global:
username: example
password: password

internal:
awsPartition: "aws"

cluster:
internal:
ephemeralConfiguration:
Expand Down
25 changes: 25 additions & 0 deletions helm/cluster-aws/templates/_awspartition.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{{- /*
Extracts the AWS partition from an ARN string.
Example usage: {{ include "extractAWSPartition" "arn:aws:iam::1234567890:role/example-role" }}

Input: An ARN string
Output: The AWS partition (e.g., "aws", "aws-cn")
*/ -}}
{{- define "extractAWSPartition" -}}
{{- $parts := (split ":" .) -}}
{{- if ge (len $parts) 5 -}}{{- $parts._1 -}}{{- end -}}
{{- end -}}

{{- define "aws-partition" -}}
{{- $roleName := .Values.global.providerSpecific.awsClusterRoleIdentityName -}}
{{- $partition := .Values.internal.awsPartition -}}
{{- $role := (lookup "infrastructure.cluster.x-k8s.io/v1beta2" "AWSClusterRoleIdentity" "" $roleName) -}}
{{- if $role -}}
{{- $partition = (include "extractAWSPartition" $role.spec.roleARN) -}}
{{- end -}}
{{- if eq $partition "" -}}
{{- fail "failed to extract AWS Partition from AWSClusterRoleIdentity" -}}
{{- else -}}
{{- $partition -}}
{{- end -}}
{{- end -}}
15 changes: 15 additions & 0 deletions helm/cluster-aws/templates/_machine_pools.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,21 @@ spec:
minHealthyPercentage: {{ $value.minHealthyPercentage | default 90 }}
ignition:
version: "3.4"
lifecycleHooks:
- defaultResult: CONTINUE

{{/*
The default is a high enough heartbeat timeout because aws-node-termination-handler (shortened to "NTH" here)
doesn't send heartbeats (https://github.com/aws/aws-node-termination-handler/issues/493),
but low enough so that if the controller is down, instances can still terminate within
a reasonable time.
*/}}
heartbeatTimeout: "{{ ($value.awsNodeTerminationHandler).heartbeatTimeoutSeconds | default 1800 }}s"
lifecycleTransition: autoscaling:EC2_INSTANCE_TERMINATING
name: aws-node-termination-handler
notificationTargetARN: arn:{{ include "aws-partition" $}}:sqs:{{ include "aws-region" $ }}:{{ include "aws-account-id" $}}:{{ include "resource.default.name" $ }}-nth
roleARN: arn:{{ include "aws-partition" $}}:iam::{{ include "aws-account-id" $}}:role/{{ include "resource.default.name" $ }}-nth-notification
---
{{ end }}
{{- end -}}
92 changes: 92 additions & 0 deletions helm/cluster-aws/templates/aws-nth-app.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
{{/* Default Helm values for the app */}}
{{/* See schema for the appropriate app version here https://github.com/giantswarm/aws-nth-bundle/blob/main/helm/aws-nth-bundle/values.schema.json */}}
{{- define "defaultAwsNodeTerminationHandlerHelmValues" }}
awsNodeTerminationHandler:
values:
image:
registry: {{ include "awsContainerImageRegistry" $ }}

# Allow running on control plane nodes. On deletion, CAPI will first delete the worker nodes
# and we still want aws-node-termination-handler, if it's even still running and the HelmRelease
# not deleted yet, to take care of the last workers' EC2 lifecycle hooks since they otherwise
# won't be completed, resulting in unnecessary waiting time before AWS can terminate the
# instances (see `AWSMachinePool.spec.lifecycleHooks["aws-node-termination-handler"].heartbeatTimeout`).
# This runs on workers by default but allows moving pods to control plane nodes. Requires
# queue processing mode i.e. running as `Deployment`, not `DaemonSet`.
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
weight: 10
tolerations:
- effect: NoSchedule
operator: Exists
key: node-role.kubernetes.io/control-plane

clusterID: {{ include "resource.default.name" $ }}
{{- if (.Values.global.connectivity.proxy).enabled }}
proxy:
noProxy: "{{ include "cluster.connectivity.proxy.noProxy" (dict "global" $.Values.global "providerIntegration" $.Values.cluster.providerIntegration) }}"
http: {{ .Values.global.connectivity.proxy.httpProxy | quote }}
https: {{ .Values.global.connectivity.proxy.httpsProxy | quote }}
{{- end }}
global:
image:
registry: {{ include "awsContainerImageRegistry" $ }}
podSecurityStandards:
enforced: {{ .Values.global.podSecurityStandards.enforced }}
{{- end }}
---
apiVersion: v1
data:
{{- $awsNodeTerminationHandlerHelmValues := (include "defaultAwsNodeTerminationHandlerHelmValues" .) | fromYaml -}}
{{- $customAwsNodeTerminationHandlerHelmValues := $.Values.global.apps.awsNodeTerminationHandler.values -}}
{{- if $customAwsNodeTerminationHandlerHelmValues }}
{{- $awsNodeTerminationHandlerHelmValues = merge (deepCopy $customAwsNodeTerminationHandlerHelmValues) $awsNodeTerminationHandlerHelmValues -}}
{{- end }}
values: | {{- $awsNodeTerminationHandlerHelmValues | toYaml | nindent 4 }}
kind: ConfigMap
metadata:
labels:
app-operator.giantswarm.io/version: 0.0.0
{{- include "labels.common" $ | nindent 4 }}
name: {{ printf "%s-aws-nth-bundle-user-values" (include "resource.default.name" $) | quote }}
namespace: {{ $.Release.Namespace | quote }}
---
apiVersion: application.giantswarm.io/v1alpha1
kind: App
metadata:
labels:
app-operator.giantswarm.io/version: 0.0.0
{{- include "labels.common" $ | nindent 4 }}
name: {{ printf "%s-aws-nth-bundle" (include "resource.default.name" $) | quote }}
namespace: {{ $.Release.Namespace | quote }}
spec:
catalog: {{ include "cluster.app.catalog" $ | quote }}
install:
timeout: "10m"
upgrade:
timeout: "10m"
kubeConfig:
inCluster: true # in management cluster context
name: aws-nth-bundle
namespace: {{ $.Release.Namespace | quote }}
{{- $_ := set $ "appName" "aws-nth-bundle" }}
{{- $appVersion := include "cluster.app.version" $ }}
version: {{ $appVersion }}
extraConfigs:
# See above
- kind: configMap
name: {{ printf "%s-aws-nth-bundle-user-values" (include "resource.default.name" $) | quote }}
namespace: {{ $.Release.Namespace | quote }}
{{- if .Values.global.apps.awsNodeTerminationHandler.extraConfigs }}
{{- range .Values.global.apps.awsNodeTerminationHandler.extraConfigs }}
- kind: {{ .kind }}
name: {{ .name }}
namespace: {{ .namespace | default $.Release.Namespace }}
priority: {{ .priority }}
{{- end }}
{{- end }}
25 changes: 25 additions & 0 deletions helm/cluster-aws/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,20 @@
"title": "Availability zone"
}
},
"awsNodeTerminationHandler": {
"type": "object",
"title": "aws-node-termination-handler related settings",
"description": "Configuration for the ASG lifecycle hook used by aws-node-termination-handler",
"properties": {
"heartbeatTimeoutSeconds": {
"type": "number",
"title": "Heartbeat timeout for ASG lifecycle hook",
"default": 1800,
"maximum": 7200,
"minimum": 30
}
}
},
"customNodeLabels": {
"type": "array",
"title": "Custom node labels",
Expand Down Expand Up @@ -694,6 +708,12 @@
"title": "AWS EBS CSI driver service monitors",
"description": "Configuration of aws-ebs-csi-driver-servicemonitors. For all available values see https://github.com/giantswarm/aws-ebs-csi-driver-servicemonitors-app."
},
"awsNodeTerminationHandler": {
"$ref": "#/$defs/app",
"type": "object",
"title": "AWS Node Termination Handler",
"description": "Configuration of aws-nth-bundle. For all available values see https://github.com/giantswarm/aws-nth-bundle."
},
"awsPodIdentityWebhook": {
"$ref": "#/$defs/app",
"type": "object",
Expand Down Expand Up @@ -1752,6 +1772,11 @@
"title": "Internal",
"description": "For Giant Swarm internal use only, not stable, or not supported by UIs.",
"properties": {
"awsPartition": {
"type": "string",
"title": "AWS Partition",
"description": "Only used when rendering the chart template locally, you shouldn't use this value."
},
"hashSalt": {
"type": "string",
"title": "Hash salt",
Expand Down
1 change: 1 addition & 0 deletions helm/cluster-aws/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@ global:
awsCloudControllerManager: {}
awsEbsCsiDriver: {}
awsEbsCsiDriverServiceMonitors: {}
awsNodeTerminationHandler: {}
awsPodIdentityWebhook: {}
capiNodeLabeler: {}
certExporter: {}
Expand Down

0 comments on commit b907a77

Please sign in to comment.