Skip to content

Azure Hyperscale: Kubernetes Setup for Dask

deepearth edited this page Aug 7, 2020 · 2 revisions

by Qie Zhang, Microsoft Azure Global (collaboration with the Devito team)

This wiki shows step by step how to set up and deploy an Azure Kubernetes cluser for running a seismic imaging job in parallel through Azure Kubernetes Service (AKS), where AKS automates application deployment, scaling and management. This document is inherited and expanded from Kubernetes setup on Devito github.

The deployment of Kubernetes cluster returns an IP address that is taken by Dask which distributes workloads (such as calculating the FWI gradient for each shot) to parallel workers in the Kubernetes cluster. The Kubernetes setup can be used for 2D/3D seismic RTM/FWI jobs - here is a 2D FWI example related to this document.

  1. Install Azure command-line interface (Azure CLI) that is a set of commands used to create and manage Azure resources. (more info)
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
  1. Sign in with Azure CLI. If the CLI can open your default browser, it will do so and load an Azure sign-in page. (more info)
az login

After sign-in, you may want to list all subscriptions and set a subscription to be the current active subscription - this is optional. (more info)

az account list --output table
az account set --subscription "YOUR-SUBSCRIPTION-TO-BE-CURRENT"
  1. Create a new resource group. (more info)
az group create --name fwirg --location southcentralus

To confirm the creation, you can list all resource groups.

az group list --output table
  1. Create an Azure Container Registry (ACR). (more info)
az acr create --resource-group fwirg --name fwiacr --sku Basic
  1. You can install Docker as below, but suggest instead to follow the Docker official instruction to install Docker.
sudo apt-get update
sudo apt-get remove docker docker_engine docker.io
sudo apt install docker.io
  1. Log in to the Azure Container Registry through the Docker CLI. (more info)
sudo az acr login --name fwiacr

After the successful login, you should expect the message #Login Succeeded.

  1. List container registries in your resource group and show the results in a table. (more info)
az acr list --resource-group fwirg --query "[].{acrLoginServer:loginServer}" --output table

The output should look like below which will be used in step 10.

#AcrLoginServer
#-----------------
#fwiacr.azurecr.io
  1. Download the Devito repo.
git clone https://github.com/devitocodes/devito.git
  1. In the Devito home directory, create the Devito docker image. Note this is only for CPU (GPU capability will be added later). (more info)
sudo docker build -t devito_base . -f docker/Dockerfile
  1. Locally, tag the Devito docker image. Note fwiacr.azurecr.io is from step 7. (more info)
sudo docker tag devito_base fwiacr.azurecr.io/devito_base:v1
  1. Upload the Devito docker image to the Azure Container Registry. (more info)
sudo docker push fwiacr.azurecr.io/devito_base

List repositories in the Azure Container Registry to confirm the success of the command above. (more info)

az acr repository list --name fwiacr --output table

The output should look like below.

#Result
#-----------
#devito_base
  1. Create a new Kubernetes cluster. You can replace the VM Standard_HB120rs_v2 with the one you prefer. Standard_HB120rs_v2 has 480G memory which is a good choice for 3D FWI. --node-count 2 denotes 2 VMs to be used for the cluster. (more info)
az aks create \
     --resource-group fwirg \
     --name fwicluster1 \
     --node-vm-size=Standard_HB120rs_v2 \
     --node-count 2 \
     --generate-ssh-keys \
     --attach-acr fwiacr

List Kubernetes clusters to confirm the success of the command above. (more info)

az aks list --output table
  1. Download and install kubectl, the Kubernetes command-line interface (CLI). (more info)
sudo az aks install-cli
  1. Get access credentials for the Kubernetes cluster. (more info)
az aks get-credentials --resource-group fwirg --name fwicluster1
  1. Create the file dask-cluster.yaml. The content of the yaml file is listed at the end of this document.

  2. Apply the Kubernetes configuration. (more info)

kubectl apply -f dask-cluster.yaml

This will setup a Kubernetes cluster with 1 Dask scheduler with an open port 8786 to the world, and 16 workers that will connect to this scheduler automatically. Run the command below to confirm all pods are created. (more info)

kubectl get pods
  1. Find the IP address of the scheduler service. (more info)
kubectl get services

The output looks like below. Keep down the EXTERNAL-IP of the LoadBalancer, which will be used in the Dask configuration in step 19.

NAME            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)          AGE
devito-server   LoadBalancer   10.0.169.103   40.80.209.180   8786:31150/TCP   85m
kubernetes      ClusterIP      10.0.0.1       <none>          443/TCP          101m
  1. Open up the Devito Dask tutorial in a Jupyter notebook (you can choose to use docker or not). Note this Dask tutorial notebook is a 2D FWI example that does not require much memory, so you do not have to go to Standard_HB120rs_v2 (which is a good choice for 3D FWI).

  2. In cell 5 of the notebook, replace the two lines

cluster = LocalCluster(n_workers=nsources, death_timeout=600)
client = Client(cluster)

with

client = Client('40.80.209.180:8786')

where 40.80.209.180 is the EXTERNAL-IP of the scheduler/LoadBalancer we found in step 17.

  1. Run all cells in the notebook. It will distribute jobs to 16 workers on the Kubernetes cluster we created.

  2. When the job is finished, run the command to delete the Kubernetes cluster. (more info)

az aks delete --resource-group fwirg --name fwicluster1
  1. If you would also like to delete the resource group, run this command. (more info)
az group delete --name fwirg

 

File dask-cluster.yaml content is listed below, as used in step 15.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: devito-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: devito-server
  template:
    metadata:
      labels:
        app: devito-server
    spec:
      nodeSelector:
        "beta.kubernetes.io/os": linux
      containers:
      - name: devito-server
        image: fwiacr.azurecr.io/devito_base:v1
        command: ['/venv/bin/dask-scheduler']
        ports:
        - containerPort: 8786
          name: devito-server
---
apiVersion: v1
kind: Service
metadata:
  name: devito-server
spec:
  type: LoadBalancer
  ports:
  - port: 8786
  selector:
    app: devito-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: devito-worker
spec:
  replicas: 16
  selector:
    matchLabels:
      app: devito-worker
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  minReadySeconds: 5 
  template:
    metadata:
      labels:
        app: devito-worker
    spec:
      nodeSelector:
        "beta.kubernetes.io/os": linux
      containers:
      - name: devito-worker
        env:
        - name: PYTHONPATH
          value: /app
        - name: DEVITO_LANGUAGE
          value: "openmp"
        - name: OMP_PROC_BIND
          value: "TRUE"
        image: fwiacr.azurecr.io/devito_base:v1
        command: ['/venv/bin/dask-worker', 'tcp://devito-server:8786']
        ports:
        - containerPort: 80