Skip to content

HASTE-project/spark-hdfs-deployment

Repository files navigation

Ansible Playbooks for Automated Deployment of servers for benchmarking study

see: Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison [https://arxiv.org/abs/1807.07724]

Tested with Ubuntu LTS 16.04

Ensure that SSH configuration and IP addresses are configured (in ~/.ssh/config and /etc/hosts) first. See 'hostnames.yml' for more details.

For HPC2N use -i hosts_hpc2n For UPPMAX use -i hosts_uppmax

Note that many hosts do not have public IPs, you will need to configure SSH forwarding via one of the servers with a public IP

ansible -i hosts_uppmax all -a "echo hi"
ansible -i hosts_hpc2n all -a "echo hi"

To deploy entire pipeline (dry run):

ansible-playbook -i hosts_hpc2n site.yml --check
ansible-playbook -i hosts_uppmax site.yml --check

To deploy for real:

ansible-playbook -i hosts_hpc2n site.yml
ansible-playbook -i hosts_uppmax site.yml

To restart the Spark master and slaves: (factored out to allow easy restarting for benchmarking tests)

GOTCHA: check the Spark master host name - its hard coded!!!

ansible-playbook -i hosts_ben_uppmax playbooks-util/restart-spark-cluster.yml

Contributors: Ben Blamey

About

Ansible playbooks and roles for the HASTE pipeline

Resources

Stars

Watchers

Forks

Packages

No packages published