Skip to content

Latest commit

 

History

History
258 lines (181 loc) · 12.1 KB

SetupAzureBasicMPICluster.md

File metadata and controls

258 lines (181 loc) · 12.1 KB

Setting up a Basic MPI Cluster in Azure

This guide outlines the simplest method for creating an MPI cluster in Azure. The steps provided here serve as a visual demonstration of the process and are not intended for use in a production High-Performance Computing (HPC) environment.

NOTE: The cluster created will be able to run MPI across the general-purpose network in Azure. You need to ensure using the right subscription when executing commands in Azure CLI (az account list is listing your subscriptions, with az account set --subscription "Your Azure Subscription" you change the default subscription.

For training purposes, the guide will use cheaper virtual machines (Standard D2as v4 with 2 vcpus and 8 GiB memory or Standard DS1 v2 with 1 vcpu). However, for the final Stockfish chess engine cluster, it is recommended to use the Standard D8as v4 with 8 vcpus and 32 GiB memory or higher.

The cluster will use the standard Azure network private IP address instead of InfiniBand. All steps will be performed using the Azure CLI and it is assumed that you have already set it up with your Azure account and subscription.

Before you start ensure you have enough quota on Azure and ensure you have Azure CLI installed and activated in Powershell (or your preferred terminal). To execute commands you can easily run Azure CLI in Powershell and connect from your machine. Keep in mind that some commands will only work if you open cloud shell (such as cloud-init). Alt text To use custom data, you must Base64-encode the contents before passing the data to the API--unless you're using a CLI tool that does the conversion for you, such as the Azure CLI. The size can't exceed 64 KB.

In the CLI, you can pass your custom data as a file, as the following example shows. The file will be converted to Base64.

The next step is to create a file in your current shell, named cloud-init.txt. Goal is to customize our VM nodes we want to create. Here are the steps:

  • Create a cloud-init configuration file: The cloud-init configuration file is a script that contains the customization that you want to apply to the virtual machine. The script should be written in YAML format.

  • Pass the cloud-init configuration file to the virtual machine: You can pass the cloud-init configuration file to the virtual machine during creation or after creation by using the Azure CLI, Azure portal, or Azure Resource Manager templates.

  • Start the virtual machine: Once the virtual machine has been created, start it, and cloud-init will apply the customizations specified in the configuration file.

For more information on cloud-init and examples of cloud-init configuration files, see the official cloud-init documentation: https://cloud-init.io/

To configure our MPI Cluster create the following cloud-init.txt in your Azure Cloud Shell including the following contents:

# cloud-config
package_upgrade: true
packages:
  - clustershell
  - openmpi-bin
  - libopenmpi-dev
  - python3-pip
  - python3-mpi4py
  - python3-numpy

This file will install the necessary packages (clustershell, openmpi-bin, and libopenmpi-dev) to run MPI on your cluster. Not mandatory, but maybe useful, installing mpi4py to run some python test scripts.

Create a Resource Group

Create a resource group with the az group create command. An Azure resource group is a logical container into which Azure resources are deployed and managed. It is a way to group our cluster components and to keep them in the same network segment. Run the following command to create a resource group with your location (here westus):

az group create --name myResourceGroup --location westus

In case you made a mistake you can delete the group:

az group delete --name myResourceGroup

Create a Proximity Placement Group

A proximity placement group (ppg) is used to keep all VMs within the same low-latency network. Run the following command to create a ppg with your choice of VM size:

az ppg create --name myclusterppg --resource-group myResourceGroup --intent-vm-sizes Standard_DS1_v2          

To delete the group:

az group delete --name myclusterppg

You can check if everything is ok with a simple az command:

az ppg list -o table
Location    Name          ProximityPlacementGroupType    ResourceGroup
----------  ------------  -----------------------------  ---------------
westus      myclusterppg  Standard                       MYRESOURCEGROUP

Create Compute Nodes

To create a group of four compute nodes in the ppg, run the following command:

az vm create --name mycluster --resource-group myResourceGroup --image UbuntuLTS --ppg myclusterppg --generate-ssh-keys --size Standard_DS1_v2 --accelerated-networking true --custom-data cloud-init.txt --count 4

This will create four VMs named mycluster0, mycluster1, mycluster2, and mycluster3. To see all the created resources, run:

az resource list --resource-group myclusterstrg -o table

Now we check if everthing is up and running:

az resource list --resource-group myresourcegroup -o table

The compute nodes that are created have public IP addresses and are located in a shared subnet on the same Virtual Network (VNet), with close physical proximity, due to the proximity placement group (ppg). To view the IP addresses, you can run the following command:

$ az vm list-ip-addresses --resource-group myresourcegroup -o table

VirtualMachine    PublicIPAddresses    PrivateIPAddresses
----------------  -------------------  --------------------
mycluster0        X.X.X.1               Y.Y.Y.5
mycluster1        X.X.X.56              Y.Y.Y.4 

Replace X.X.X.1 and X.X.X.56 with the actual public IP addresses of your compute nodes and Y.Y.Y.5 and Y.Y.Y.4 with their respective private IP addresses

The az vm create command created a user on each of the VMs with the same name as the local user who ran the command (e.g. mpiuser), but this can be overridden using --admin-username yourusername (mpiuser e.g.) in the command line. Additionally, the local SSH key (~/.ssh/id_rsa.pub) was added to each VM's authorized_keys file. As a result, you should now be able to log into your head node via PowerShell with ssh mpiuser@PublicIPAddress.

To start working with our new cluster we need to ssh from mycluster0 to mycluster 1-4. I do not recommend for security reason to use your local machine public key. A better way is to create a key on mycluster0 and copy the key to mycluster1-3.

Create ssh key on mycluster0:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub

copy the key, and then login to `mycluster1-3'

cd .ssh
nano authorized_keys`

and paste the key in a new line, save the file. You should now be able to connext to all nodes from mycluster0. A first test with the famous HelloWorld.c (in MPI-Tests subfolder)

Hello World Example

Copy to mycluster0 and compile with OpenMPI:

mpicc helloworld.c -o helloworld

The helloworld program has to be on mycluster 1-3 too. We can easily copy with

clush -w mycluster[1-3] -c helloworld

You can now execute:

mpirun --host mycluster0,mycluster1,mycluster2,mycluster3 ./helloworld

Hello world from processor mycluster0, rank 0 out of 4 processors
Hello world from processor mycluster3, rank 3 out of 4 processors
Hello world from processor mycluster1, rank 1 out of 4 processors
Hello world from processor mycluster2, rank 2 out of 4 processors

A first test with latency might be helpful.

MPI Latency Test

is a benchmarking tool used to measure the latency (or response time) of a Message Passing Interface (MPI) communication between two MPI processes.

mpirun -np 2 --host cluster1,cluster2 ./osu-micro-benchmarks-6.1/install/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency

The above example command uses the MPI implementation mpirun to launch two MPI processes on two separate hosts, cluster1 and cluster2. The benchmark is executed from the osu-micro-benchmarks-6.1 package and specifically the osu_latency test within the pt2pt MPI communication benchmark suite. The benchmark measures the time it takes for two MPI processes to send short messages to each other and can be used to evaluate the performance of MPI communication in a cluster computing environment.

Installation:

wget https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-6.1.tar.gz
tar zxf osu-micro-benchmarks-6.1.tar.gz
cd osu-micro-benchmarks-6.1/
./configure --prefix $PWD/install CC=mpicc CXX=mpicxx
make && make install

And then run a first test with:

mpirun -np 2 mycluster1,mycluster2,mycluster3 ./osu-micro-benchmarks-6.1/install/libexec/osu-micro-benc
hmarks/mpi/pt2pt/osu_latency

To optimize an MPI cluster we can remove public IP addresses for all nodes but not the master.

Azure CLI Script to create the cluster

This script is a bash script that utilizes the Azure CLI (Command-Line Interface) to create a resource group, a proximity placement group (PPG), and virtual machines (VMs) with a specified number of nodes.

Breakdown of the script's main functionalities:

The script defines several variables such as the resource group name, location, proximity placement group name, virtual machine name, image, and size. It prompts the user to enter the number of nodes to create. The script creates a resource group using the "az group create" command and specifies the name and location of the group. It then creates a proximity placement group using the "az ppg create" command and specifies the name, resource group, instance count, and size of the group. Next, the script runs a for loop that creates a specified number of virtual machines using the "az vm create" command. Each virtual machine is named using a combination of the virtual machine name and the iteration number. The "az vm create" command also specifies the resource group, image, proximity placement group, size, and custom data (cloud-init.txt). The script utilizes the "read" command to allow the user to input the number of nodes to create before executing the rest of the script.

#!/bin/bash
# Azure CLI commands 
# The script will create a resource group, a proximity placement group, 
# and VMs with a specified number of nodes

rg_name="myResourceGroup"
location="westus"
ppg_name="myclusterppg"
vm_name="mycluster"
image="UbuntuLTS"
size="Standard_DS1_v2"

read -p "Enter the number of nodes: " node_count

az group create --name $rg_name --location $location
az ppg create --name $ppg_name --resource-group $rg_name --instance-count $node_count --instance-size $size
for i in $(seq 1 $node_count); do
  az vm create --name "${vm_name}-${i}" --resource-group $rg_name --image $image --ppg $ppg_name --generate-ssh-keys --size $size --accelerated-networking true --custom-data cloud-init.txt
done

Azure CLI Script removing public IP

This script uses the Azure CLI to perform the following operations in a loop 3 times:

  1. Remove a public IP address from a network interface configuration.
  2. Delete a public IP.
#!/bin/bash

rg_name="myResourceGroup"
vm_name_prefix="mycluster"
nic_name_prefix="myclusterVMNic"
public_ip_prefix="myclusterPublicIP"

for i in {1..3}; do
    nic_name="${nic_name_prefix}${i}"
    public_ip="${public_ip_prefix}${i}"
    az network nic ip-config update --resource-group $rg_name \
                                    --name "ipconfig${vm_name_prefix}${i}" \
                                    --nic-name $nic_name \
                                    --remove PublicIpAddress
    az network public-ip delete --resource-group $rg_name \
                                --name $public_ip
done

Explanation:

This shell script updates the IP configuration of 3 network interfaces (ipconfigmycluster1, ipconfigmycluster2, ipconfigmycluster3) and removes their respective public IP addresses. Then, it deletes 3 public IPs (myclusterPublicIP1, myclusterPublicIP2, myclusterPublicIP3) associated with these network interfaces. All these resources belong to a resource group named myResourceGroup.