GitHub - HASTE-project/spark-hdfs-deployment: Ansible playbooks and roles for the HASTE pipeline

Ansible Playbooks for Automated Deployment of servers for benchmarking study

see: Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison [https://arxiv.org/abs/1807.07724]

Tested with Ubuntu LTS 16.04

Ensure that SSH configuration and IP addresses are configured (in ~/.ssh/config and /etc/hosts) first. See 'hostnames.yml' for more details.

For HPC2N use -i hosts_hpc2n For UPPMAX use -i hosts_uppmax

Note that many hosts do not have public IPs, you will need to configure SSH forwarding via one of the servers with a public IP

ansible -i hosts_uppmax all -a "echo hi"
ansible -i hosts_hpc2n all -a "echo hi"

To deploy entire pipeline (dry run):

ansible-playbook -i hosts_hpc2n site.yml --check
ansible-playbook -i hosts_uppmax site.yml --check

To deploy for real:

ansible-playbook -i hosts_hpc2n site.yml
ansible-playbook -i hosts_uppmax site.yml

To restart the Spark master and slaves: (factored out to allow easy restarting for benchmarking tests)

GOTCHA: check the Spark master host name - its hard coded!!!

ansible-playbook -i hosts_ben_uppmax playbooks-util/restart-spark-cluster.yml

Contributors: Ben Blamey

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.idea		.idea
playbooks-util		playbooks-util
roles		roles
.gitignore		.gitignore
ansible.cfg		ansible.cfg
harmonicio.yml		harmonicio.yml
haste-pipeline-ansible.iml		haste-pipeline-ansible.iml
hosts_hpc2n		hosts_hpc2n
hosts_uppmax		hosts_uppmax
readme.md		readme.md
site.yml		site.yml
uppmax-bastion-hosts.txt		uppmax-bastion-hosts.txt

Provide feedback