Skip to content

Commit

Permalink
Spark Integration Configuration (#196)
Browse files Browse the repository at this point in the history
* wip spark configuration

* fixup depends on init action

* fixup use staging bucket

* fixup! docs, volumes and init action bug

* spark tasks use ccache cluster policy rule

* use 1.10.x operator path

* Attempt to fix the image

* Update terraform/modules/airflow_tenant/modules/airflow_app/main.tf

Co-authored-by: Kamil Breguła <[email protected]>

* Update sfdc-airflow-aas/sfdc_airflow/cluster_policy/rules.py

Co-authored-by: Kamil Breguła <[email protected]>

* improve hadoop config organization on GCS

* set core / yarn configmaps

* escape commas to make helm happy

* improve spark logging, add docs

* revert log4j

* fix env var name

* fix leading newline in hadoop configs

* fix yarn site in configmap

* remove duplicate conf in exported gcs path

* Update subrepos/airflow/chart/templates/workers/worker-deployment.yaml

* Update subrepos/airflow/chart/templates/workers/worker-deployment.yaml

* add back log4j

* working demo

* Add spark provider package

* wip

* fix numbers add dive

* add deploying iac docs

* fix gcs connector verification

* [Discuss] Dynamic hadoop configs (#220)

* improve apply-all behavior

* improve apply-all behavior

Co-authored-by: Kamil Breguła <[email protected]>
Co-authored-by: Kamil Breguła <[email protected]>
  • Loading branch information
3 people authored and potiuk committed Oct 9, 2020
1 parent 6be69a7 commit 5f3a0f6
Show file tree
Hide file tree
Showing 6 changed files with 833 additions and 1 deletion.
8 changes: 8 additions & 0 deletions chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,14 @@ The following tables lists the configurable parameters of the Airflow chart and
| `allowPodLaunching` | Allow airflow pods to talk to Kubernetes API to launch more pods | `true` |
| `defaultAirflowRepository` | Fallback docker repository to pull airflow image from | `apache/airflow` |
| `defaultAirflowTag` | Fallback docker image tag to deploy | `1.10.10.1-alpha2-python3.6` |
| `hadoop.enabled` | Configure Hadoop configuration files for remote submitting Hadoop / YARN jobs to a remote cluster | `false` |
| `hadoop.configPath` | Path on which to mount hadoop configuration files `core-site.xml` and `yarn-site.xml` | `/etc/hadoop/conf` |
| `hadoop.core` | Contents of `core-site.xml` to point to a remote Hadoop cluster to interact with | `~` |
| `hadoop.yarn` | Contents of `yarn-site.xml` to point to a remote Hadoop cluster to interact with | `~` |
| `spark.enabled` | Configure Spark configuration files for remote submitting Spark to a remote YARN cluster | `false` |
| `spark.configPath` | Path on which to mount spark configuration files `spark-env.sh` | `/etc/spark/conf` |
| `spark.homePath` | Path to directory where spark binaries are installed in your Airflow Image (a.k.a `SPARK_HOME`) | `/opt/spark` |
| `spark.sparkEnv` | Contents of `yarn-site.xml` to point to a remote hadoop cluster to interact with | `~` |
| `images.airflow.repository` | Docker repository to pull image from. Update this to deploy a custom image | `~` |
| `images.airflow.tag` | Docker image tag to pull image from. Update this to deploy a new custom image tag | `~` |
| `images.airflow.pullPolicy` | PullPolicy for airflow image | `IfNotPresent` |
Expand Down
11 changes: 11 additions & 0 deletions chart/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,4 +67,15 @@ data:
krb5.conf: |
{{ tpl .Values.kerberos.config . | nindent 4 }}
{{- end }}
{{- if .Values.hadoop.enabled }}
core-site.xml: |
{{- tpl .Values.hadoop.core . | nindent 4 }}
yarn-site.xml: |
{{- tpl .Values.hadoop.yarn . | nindent 4 }}
{{- end }}
{{- if .Values.spark.enabled }}
spark-env.sh: |
{{- tpl .Values.spark.sparkEnv . | nindent 4 }}
log4j.properties: |
{{- tpl .Values.spark.log4j . | nindent 4 }}
{{- end }}
44 changes: 43 additions & 1 deletion chart/templates/workers/worker-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,17 @@ spec:
mountPath: {{ .Values.kerberos.ccacheMountPath | quote }}
readOnly: true
{{- end }}
{{- if .Values.scheduler.airflowLocalSettings }}
{{- if .Values.hadoop.enabled }}
- name: hadoop-config-vol
mountPath: {{ .Values.hadoop.configPath | quote }}
readOnly: true
{{- end }}
{{- if .Values.spark.enabled }}
- name: spark-config-vol
mountPath: {{ .Values.spark.configPath | quote }}
readOnly: true
{{- end }}
{{- if .Values.scheduler.airflowLocalSettings }}
- name: config
mountPath: {{ template "airflow_local_setting_path" . }}
subPath: airflow_local_settings.py
Expand All @@ -144,6 +154,12 @@ spec:
mountPath: {{ template "airflow_dags_mount_path" . }}
{{- end }}
env:
- name: HADOOP_CONF_DIR
value: {{ .Values.hadoop.configPath }}
- name: SPARK_CONF_DIR
value: {{ .Values.spark.configPath }}
- name: SPARK_HOME
value: {{ .Values.spark.homePath }}
{{- include "custom_airflow_environment" . | indent 10 }}
{{- include "standard_airflow_environment" . | indent 10 }}
{{- if $persistence }}
Expand Down Expand Up @@ -196,10 +212,36 @@ spec:
value: {{ .Values.kerberos.configPath | quote }}
- name: KRB5CCNAME
value: {{ include "kerberos_ccache_path" . | quote }}
- name: HADOOP_CONF_DIR
value: {{ .Values.hadoop.configPath }}
- name: SPARK_CONF_DIR
value: {{ .Values.spark.configPath }}
- name: SPARK_HOME
value: {{ .Values.spark.homePath }}
{{- include "custom_airflow_environment" . | indent 10 }}
{{- include "standard_airflow_environment" . | indent 10 }}
{{- end }}
volumes:
{{- if .Values.hadoop.enabled }}
- name: hadoop-config-vol
configMap:
name: {{ template "airflow_config" . }}
items:
- key: core-site.xml
path: core-site.xml
- key: yarn-site.xml
path: yarn-site.xml
{{- end }}
{{- if .Values.spark.enabled }}
- name: spark-config-vol
configMap:
name: {{ template "airflow_config" . }}
items:
- key: spark-env.sh
path: spark-env.sh
- key: log4j.properties
path: log4j.properties
{{- end }}
- name: kerberos-keytab
secret:
secretName: {{ include "kerberos_keytab_secret" . | quote }}
Expand Down
19 changes: 19 additions & 0 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,25 @@ kerberos:
admin_server = admin_server.foo.com
}
# Specify configuration for connecting to a Hadoop cluster
hadoop:
enabled: false
configPath: '/etc/hadoop/conf/'
# Contents of core-site.xml to point at your remote hadoop cluster
core: ~
# Contents of yarn-site.xml to point at your remote hadoop cluster
yarn: ~

# Specify configuration for Spark submit
spark:
enabled: false
# SPARK_HOME
homePath: '/opt/spark'
configPath: '/etc/spark/conf/'
# Contents of spark-env.sh
sparkEnv: |
export HADOOP_CONFDIR={{ .Values.hadoop.configPath | quote}}
export SPARK_HOME={{ .Values.spark.homePath | quote }}
# Airflow Worker Config
workers:
Expand Down
Loading

0 comments on commit 5f3a0f6

Please sign in to comment.