Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/airflow] version 7.6.0 #23651

Merged
merged 13 commits into from
Sep 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion stable/airflow/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v1
description: Airflow is a platform to programmatically author, schedule and monitor workflows
name: airflow
version: 7.5.0
version: 7.6.0
appVersion: 1.10.10
icon: https://airflow.apache.org/_images/pin_large.png
home: https://airflow.apache.org/
Expand All @@ -13,3 +13,4 @@ sources:
keywords:
- workflow
- dag
- airflow
31 changes: 25 additions & 6 deletions stable/airflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ kubectl exec \
> NOTE: for chart version numbers, see [Chart.yaml](Chart.yaml) or [helm hub](https://hub.helm.sh/charts/stable/airflow).

For steps you must take when upgrading this chart, please review:
* [v7.5.X → v7.6.0](UPGRADE.md#v75x--v760)
* [v7.4.X → v7.5.0](UPGRADE.md#v74x--v750)
* [v7.3.X → v7.4.0](UPGRADE.md#v73x--v740)
* [v7.2.X → v7.3.0](UPGRADE.md#v72x--v730)
Expand Down Expand Up @@ -223,12 +224,18 @@ For a worker pod you can calculate it: `WORKER_CONCURRENCY * 200Mi`, so for `10
Here is the `values.yaml` config for that example:
```yaml
workers:
replicas: 1
# the initial/minimum number of workers
replicas: 2

resources:
requests:
memory: "2Gi"

podDisruptionBudget:
enabled: true
## prevents losing more than 20% of current worker task slots in a voluntary disruption
maxUnavailable: "20%"

autoscaling:
enabled: true
maxReplicas: 16
Expand All @@ -243,11 +250,14 @@ workers:
celery:
instances: 10

## wait at most 10min for running tasks to complete
## wait at most 9min for running tasks to complete before SIGTERM
## WARNING:
## - some cluster-autoscaler (GKE) will not respect graceful
## termination periods over 10min
gracefullTermination: true
gracefullTerminationPeriod: 600
gracefullTerminationPeriod: 540

## how many seconds (after the 10min) to wait before SIGKILL
## how many seconds (after the 9min) to wait before SIGKILL
terminationPeriod: 60

dags:
Expand Down Expand Up @@ -548,6 +558,7 @@ __Airflow Scheduler values:__
| `scheduler.podLabels` | Pod labels for the scheduler Deployment | `{}` |
| `scheduler.annotations` | annotations for the scheduler Deployment | `{}` |
| `scheduler.podAnnotations` | Pod Annotations for the scheduler Deployment | `{}` |
| `scheduler.safeToEvict` | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | `true` |
| `scheduler.podDisruptionBudget.*` | configs for the PodDisruptionBudget of the scheduler | `<see values.yaml>` |
| `scheduler.connections` | custom airflow connections for the airflow scheduler | `[]` |
| `scheduler.refreshConnections` | if we remove before adding a connection resulting in a refresh | `true` |
Expand All @@ -559,7 +570,7 @@ __Airflow Scheduler values:__
| `scheduler.initialStartupDelay` | the number of seconds to wait (in bash) before starting the scheduler container | `0` |
| `scheduler.extraInitContainers` | extra init containers to run before the scheduler pod | `[]` |

__Airflow WebUI Values:__
__Airflow Webserver Values:__

| Parameter | Description | Default |
| --- | --- | --- |
Expand All @@ -572,6 +583,8 @@ __Airflow WebUI Values:__
| `web.podLabels` | Pod labels for the web Deployment | `{}` |
| `web.annotations` | annotations for the web Deployment | `{}` |
| `web.podAnnotations` | Pod annotations for the web Deployment | `{}` |
| `web.safeToEvict` | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | `true` |
| `web.podDisruptionBudget.*` | configs for the PodDisruptionBudget of the web Deployment | `<see values.yaml>` |
| `web.service.*` | configs for the Service of the web pods | `<see values.yaml>` |
| `web.baseUrl` | sets `AIRFLOW__WEBSERVER__BASE_URL` | `http://localhost:8080` |
| `web.serializeDAGs` | sets `AIRFLOW__CORE__STORE_SERIALIZED_DAGS` | `false` |
Expand All @@ -598,6 +611,8 @@ __Airflow Worker Values:__
| `workers.podLabels` | Pod labels for the worker StatefulSet | `{}` |
| `workers.annotations` | annotations for the worker StatefulSet | `{}` |
| `workers.podAnnotations` | Pod annotations for the worker StatefulSet | `{}` |
| `workers.safeToEvict` | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | `true` |
| `workers.podDisruptionBudget.*` | configs for the PodDisruptionBudget of the worker StatefulSet | `<see values.yaml>` |
| `workers.autoscaling.*` | configs for the HorizontalPodAutoscaler of the worker Pods | `<see values.yaml>` |
| `workers.initialStartupDelay` | the number of seconds to wait (in bash) before starting each worker container | `0` |
| `workers.celery.*` | configs for the celery worker Pods | `<see values.yaml>` |
Expand All @@ -618,11 +633,14 @@ __Airflow Flower Values:__
| `flower.podLabels` | Pod labels for the flower Deployment | `{}` |
| `flower.annotations` | annotations for the flower Deployment | `{}` |
| `flower.podAnnotations` | Pod annotations for the flower Deployment | `{}` |
| `flower.safeToEvict` | if we should tell Kubernetes Autoscaler that its safe to evict these Pods | `true` |
| `flower.podDisruptionBudget.*` | configs for the PodDisruptionBudget of the flower Deployment | `<see values.yaml>` |
| `flower.basicAuthSecret` | the name of a pre-created secret containing the basic authentication value for flower | `""` |
| `flower.basicAuthSecretKey` | the key within `flower.basicAuthSecret` containing the basic authentication string | `""` |
| `flower.urlPrefix` | sets `AIRFLOW__CELERY__FLOWER_URL_PREFIX` | `""` |
| `flower.service.*` | configs for the Service of the flower Pods | `<see values.yaml>` |
| `flower.initialStartupDelay` | the number of seconds to wait (in bash) before starting the flower container | `0` |
| `flower.minReadySeconds` | the number of seconds to wait before declaring a new Pod available | `5` |
| `flower.extraConfigmapMounts` | extra ConfigMaps to mount on the flower Pods | `[]` |

__Airflow Logs Values:__
Expand Down Expand Up @@ -672,6 +690,7 @@ __Airflow Database (Internal PostgreSQL) Values:__
| `postgresql.existingSecret` | the name of a pre-created secret containing the postgres password | `""` |
| `postgresql.existingSecretKey` | the key within `postgresql.passwordSecret` containing the password string | `postgresql-password` |
| `postgresql.persistence.*` | configs for the PVC of postgresql | `<see values.yaml>` |
| `postgresql.master.*` | configs for the postgres StatefulSet | `<see values.yaml>` |

__Airflow Database (External) Values:__

Expand Down Expand Up @@ -718,4 +737,4 @@ __Airflow Prometheus Values:__
| `serviceMonitor.interval` | the ServiceMonitor web endpoint path | `30s` |
| `prometheusRule.enabled` | if the PrometheusRule resources should be deployed | `false` |
| `prometheusRule.additionalLabels` | labels for PrometheusRule, so that Prometheus can select it | `{}` |
| `prometheusRule.groups` | alerting rules for Prometheus | `[]` |
| `prometheusRule.groups` | alerting rules for Prometheus | `[]` |
53 changes: 48 additions & 5 deletions stable/airflow/UPGRADE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,49 @@
# Upgrading Steps

## `v7.5.X` → `v7.6.0`

> __WARNING:__
>
> We now annotate all pods with `cluster-autoscaler.kubernetes.io/safe-to-evict` by default.
>
> If you want to disable this:
> - Set: `flower.safeToEvict`, `scheduler.safeToEvict`, `web.safeToEvict`, `workers.safeToEvict` to `false`
> - Set: `postgresql.master.podAnnotations`, `redis.master.podAnnotations`, `redis.slave.podAnnotations` to `{}`
>
> Note for GKE:
> - GKE's cluster-autoscaler will not honor a `gracefulTerminationPeriod` of more than 10min,
> if your jobs need more than this amount of time to finish, please set `workers.safeToEvict` to `false`
>

__The following IMPROVEMENTS have been made:__
* The chart YAML has been refactored
* You can now configure `safe-to-evict` annotations (so that pods with emptyDir Volumes can be evicted by cluster-autoscaler)
* You can now create PodDisruptionBudgets for all components: {flower, webserver, worker}
* The chart now forces the correct ports to be used (NOTE: this will not prevent you changing Service/Ingress ports)
* You can now run multiple instances of flower
* You can now specify minReadySeconds for flower

__The following values have CHANGED DEFAULTS:__
* `workers.celery.instances`:
* Is now `16` by default (letting each worker take 16 tasks)
* `postgresql.master.podAnnotations`:
* Is now `{"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}`
* `redis.master.podAnnotations`:
* Is now `{"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}`
* `redis.slave.podAnnotations`:
* Is now `{"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"}`

__The following values have been ADDED:__
* `flower.minReadySeconds`
* `flower.podDisruptionBudget.*`
* `flower.replicas`
* `flower.safeToEvict`
* `scheduler.safeToEvict`
* `web.podDisruptionBudget.*`
* `web.safeToEvict`
* `workers.podDisruptionBudget.*`
* `workers.safeToEvict`

## `v7.4.X` → `v7.5.0`

__The following IMPROVEMENTS have been made:__
Expand All @@ -15,18 +59,17 @@ __The following values have been ADDED:__
__The following IMPROVEMENTS have been made:__

* Reduced how likely it is for a celery worker to receive SIGKILL with graceful termination enabled.
* Celery worker graceful shutdown lifecycle:
New celery worker graceful shutdown lifecycle:
1. prevent worker accepting new tasks
2. wait AT MOST `workers.celery.gracefullTerminationPeriod` for tasks to finish
3. send `SIGTERM` to worker
4. wait AT MOST `workers.terminationPeriod` for kill to finish
5. send `SIGKILL` to worker
* NOTE:
* if you currently use a high value of `workers.terminationPeriod`, consider lowering it to `60` and setting a high value for `workers.celery.gracefullTerminationPeriod`

__The following values have been ADDED:__

* `workers.celery.gracefullTerminationPeriod`
* `workers.celery.gracefullTerminationPeriod`:
* if you currently use a high value of `workers.terminationPeriod`, consider lowering it to `60` and setting a high value for `workers.celery.gracefullTerminationPeriod`

## `v7.2.X` → `v7.3.0`

Expand Down Expand Up @@ -232,4 +275,4 @@ __The following values have been ADDED:__
This version splits the specs for the NodeSelector, Affinity and Toleration features.
Instead of being global, and injected in every component, they are now defined _by component_ to provide more flexibility for your deployments.
As such, the migration steps are really simple, just ust copy and paste your node/affinity/tolerance definitions in the four airflow components, which are `worker`, `scheduler`, `flower` and `web`.
The default `values.yaml` file should help you with locating those.
The default `values.yaml` file should help you with locating those.
22 changes: 21 additions & 1 deletion stable/airflow/examples/google-gke/custom-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,21 @@ workers:
##
replicas: 2

## configs for the PodDisruptionBudget of the worker StatefulSet
##
podDisruptionBudget:
## if a PodDisruptionBudget resource is created for the worker StatefulSet
##
enabled: true

## the maximum unavailable pods/percentage for the worker StatefulSet
##
## NOTE:
## - prevents loosing more than 20% of current worker task slots in a voluntary
## disruption
##
maxUnavailable: "20%"

## configs for the HorizontalPodAutoscaler of the worker Pods
##
autoscaling:
Expand Down Expand Up @@ -245,7 +260,12 @@ workers:

## how many seconds to wait for tasks to finish before SIGTERM of the celery worker
##
gracefullTerminationPeriod: 600
## WARNING:
## - GKE cluster-autoscaler will not respect graceful termination period over 10min
## NOTE:
## - this gives any running tasks AT MOST 9min to complete
##
gracefullTerminationPeriod: 540

## how many seconds to wait after SIGTERM before SIGKILL of the celery worker
##
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: "{{ include "airflow.fullname" . }}-env"
name: {{ include "airflow.fullname" . }}-env
labels:
app: {{ include "airflow.labels.app" . }}
chart: {{ include "airflow.labels.chart" . }}
Expand Down Expand Up @@ -36,10 +36,14 @@ data:
REDIS_PORT: "{{ .Values.externalRedis.port }}"
REDIS_DBNUM: "{{ .Values.externalRedis.databaseNumber }}"
{{- end }}

{{- if .Values.flower.enabled }}
## Airflow (Flower)
AIRFLOW__CELERY__FLOWER_URL_PREFIX: "{{ .Values.flower.urlPrefix }}"
AIRFLOW__CELERY__FLOWER_PORT: "5555"
{{- end }}
## Airflow (Worker)
AIRFLOW__CELERY__WORKER_CONCURRENCY: "{{ .Values.workers.celery.instances }}"
AIRFLOW__CELERY__WORKER_LOG_SERVER_PORT: "8793"
{{- end }}

{{- if (eq .Values.airflow.executor "KubernetesExecutor") }}
Expand All @@ -66,6 +70,7 @@ data:
AIRFLOW__CORE__ENABLE_XCOM_PICKLING: "false" # for forward compatibility with 2.0
AIRFLOW__CORE__EXECUTOR: "{{ .Values.airflow.executor }}"
AIRFLOW__WEBSERVER__BASE_URL: "{{ .Values.web.baseUrl }}"
AIRFLOW__WEBSERVER__WEB_SERVER_PORT: "8080"
{{- if .Values.airflow.fernetKey }}
AIRFLOW__CORE__FERNET_KEY: "{{ .Values.airflow.fernetKey }}"
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,13 @@ metadata:
{{- toYaml .Values.flower.labels | nindent 4 }}
{{- end }}
spec:
replicas: 1
minReadySeconds: 10
replicas: {{ .Values.flower.replicas }}
minReadySeconds: {{ .Values.flower.minReadySeconds }}
strategy:
# this is safe - multiple flower pods can run concurrently
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxSurge: 25%
maxUnavailable: 0
selector:
matchLabels:
Expand All @@ -32,10 +33,13 @@ spec:
template:
metadata:
annotations:
checksum/config-env: {{ include (print $.Template.BasePath "/configmap-env.yaml") . | sha256sum }}
checksum/config-env: {{ include (print $.Template.BasePath "/config/configmap-env.yaml") . | sha256sum }}
{{- if .Values.flower.podAnnotations }}
{{- toYaml .Values.flower.podAnnotations | nindent 8 }}
{{- end }}
{{- if .Values.flower.safeToEvict }}
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
{{- end }}
labels:
app: {{ include "airflow.labels.app" . }}
component: flower
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{- if and (.Values.flower.enabled) (.Values.ingress.enabled) }}
apiVersion: extensions/v1beta1
{{- if and (.Values.flower.enabled) (eq .Values.airflow.executor "CeleryExecutor") (.Values.ingress.enabled) }}
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: {{ include "airflow.fullname" . }}-flower
Expand Down
24 changes: 24 additions & 0 deletions stable/airflow/templates/flower/flower-pdb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{{- if and (.Values.flower.enabled) (eq .Values.airflow.executor "CeleryExecutor") (.Values.flower.podDisruptionBudget.enabled) }}
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: {{ include "airflow.fullname" . }}-flower
labels:
app: {{ include "airflow.labels.app" . }}
component: flower
chart: {{ include "airflow.labels.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
{{- if .Values.flower.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.flower.podDisruptionBudget.maxUnavailable }}
{{- end }}
{{- if .Values.flower.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.flower.podDisruptionBudget.minAvailable }}
{{- end }}
selector:
matchLabels:
app: {{ include "airflow.labels.app" . }}
component: flower
release: {{ .Release.Name }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.flower.enabled }}
{{- if and (.Values.flower.enabled) (eq .Values.airflow.executor "CeleryExecutor") }}
apiVersion: v1
kind: Service
metadata:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{- if .Values.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "airflow.fullname" . }}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{{- if .Values.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "airflow.fullname" . }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ metadata:
spec:
replicas: 1
strategy:
# Kill the scheduler as soon as possible.
# It will restart quickly with all the workers, minimizing the time they are not synchronized.
# this is safe as long as `maxSurge` is 0
type: RollingUpdate
rollingUpdate:
maxSurge: 0
Expand All @@ -32,11 +31,11 @@ spec:
template:
metadata:
annotations:
checksum/config-env: {{ include (print $.Template.BasePath "/configmap-env.yaml") . | sha256sum }}
checksum/config-git-clone: {{ include (print $.Template.BasePath "/configmap-scripts-git.yaml") . | sha256sum }}
checksum/config-scripts: {{ include (print $.Template.BasePath "/configmap-scripts.yaml") . | sha256sum }}
checksum/config-variables-pools: {{ include (print $.Template.BasePath "/configmap-variables-pools.yaml") . | sha256sum }}
checksum/secret-connections: {{ include (print $.Template.BasePath "/secret-connections.yaml") . | sha256sum }}
checksum/config-env: {{ include (print $.Template.BasePath "/config/configmap-env.yaml") . | sha256sum }}
checksum/config-git-clone: {{ include (print $.Template.BasePath "/config/configmap-scripts-git.yaml") . | sha256sum }}
checksum/config-scripts: {{ include (print $.Template.BasePath "/config/configmap-scripts.yaml") . | sha256sum }}
checksum/config-variables-pools: {{ include (print $.Template.BasePath "/config/configmap-variables-pools.yaml") . | sha256sum }}
checksum/secret-connections: {{ include (print $.Template.BasePath "/config/secret-connections.yaml") . | sha256sum }}
{{- if and (.Values.dags.git.url) (.Values.dags.git.ref) }}
checksum/dags-git-ref: {{ .Values.dags.git.ref | sha256sum }}
{{- end }}
Expand All @@ -46,6 +45,9 @@ spec:
{{- if .Values.scheduler.podAnnotations }}
{{- toYaml .Values.scheduler.podAnnotations | nindent 8 }}
{{- end }}
{{- if .Values.scheduler.safeToEvict }}
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
{{- end }}
labels:
app: {{ include "airflow.labels.app" . }}
component: scheduler
Expand Down
Loading