Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

Commit

Permalink
Added dedicated Vagrant setup (closes #6) /cc @bigsnarfdude
Browse files Browse the repository at this point in the history
  • Loading branch information
alexanderdean committed Apr 29, 2015
1 parent 3a9bd3b commit 1156a3d
Show file tree
Hide file tree
Showing 9 changed files with 115 additions and 11 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,8 @@ project/boot/
project/plugins/project/

# Scala-IDE specific
.scala_dependencies
.scala_dependencies

# Vagrant
.vagrant

25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,25 @@

This is a simple word count job written in Scala for the [Spark] [spark] cluster computing platform, with instructions for running on [Amazon Elastic MapReduce] [emr] in non-interactive mode. The code is ported directly from Twitter's [`WordCountJob`] [wordcount] for Scalding.

This was built by the Professional Services team at [Snowplow Analytics] [snowplow], who use Spark on their [Data pipelines and algorithms] [data-pipelines-algos] projects.
This was built by the Data Science team at [Snowplow Analytics] [snowplow], who use Spark on their [Data pipelines and algorithms] [data-pipelines-algos] projects.

_See also:_ [Scalding Example Project] [scalding-example-project] | [Cascalog Example Project] [cascalog-example-project]
_See also:_ [Spark Streaming Example Project] [spark-streaming-example-project] | [Scalding Example Project] [scalding-example-project]

## Building

Assuming you already have [SBT] [sbt] installed:
Assuming git, **[Vagrant] [vagrant-install]** and **[VirtualBox] [virtualbox-install]** installed:

$ git clone git://github.com/snowplow/spark-example-project.git
$ cd spark-example-project
$ sbt assembly
```bash
host> git clone https://github.com/snowplow/spark-example-project
host> cd spark-example-project
host> vagrant up && vagrant ssh
guest> cd /vagrant
guest> sbt assembly
```

The 'fat jar' is now available as:

target/spark-example-project-0.2.0.jar
target/spark-example-project-0.3.0.jar

## Unit testing

Expand Down Expand Up @@ -118,16 +122,17 @@ limitations under the License.
[snowplow]: http://snowplowanalytics.com
[data-pipelines-algos]: http://snowplowanalytics.com/services/pipelines.html

[vagrant-install]: http://docs.vagrantup.com/v2/installation/index.html
[virtualbox-install]: https://www.virtualbox.org/wiki/Downloads

[spark-streaming-example-project]: https://github.com/snowplow/spark-streaming-example-project
[scalding-example-project]: https://github.com/snowplow/scalding-example-project
[cascalog-example-project]: https://github.com/snowplow/cascalog-example-project

[issue-1]: https://github.com/snowplow/spark-example-project/issues/1
[issue-2]: https://github.com/snowplow/spark-example-project/issues/2
[aws-spark-tutorial]: http://aws.amazon.com/articles/4926593393724923
[spark-emr-howto]: https://forums.aws.amazon.com/thread.jspa?messageID=458398

[sbt]: http://www.scala-sbt.org/release/docs/Getting-Started/Setup.html

[emr]: http://aws.amazon.com/elasticmapreduce/
[hello-txt]: https://github.com/snowplow/spark-example-project/raw/master/data/hello.txt
[emr-client]: http://aws.amazon.com/developertools/2264
Expand Down
19 changes: 19 additions & 0 deletions Vagrantfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Vagrant.configure("2") do |config|

config.vm.box = "ubuntu/trusty64"
config.vm.hostname = "huskimo"
config.ssh.forward_agent = true

config.vm.provider :virtualbox do |vb|
vb.name = Dir.pwd().split("/")[-1] + "-" + Time.now.to_f.to_i.to_s
vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
vb.customize [ "guestproperty", "set", :id, "--timesync-threshold", 10000 ]
# Scala is memory-hungry
vb.memory = 5120
end

config.vm.provision :shell do |sh|
sh.path = "vagrant/up.bash"
end

end
3 changes: 3 additions & 0 deletions vagrant/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.peru
oss-playbooks
ansible
2 changes: 2 additions & 0 deletions vagrant/ansible.hosts
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[vagrant]
127.0.0.1:2222
14 changes: 14 additions & 0 deletions vagrant/peru.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
imports:
ansible: ansible
ansible_playbooks: oss-playbooks

curl module ansible:
# Equivalent of git cloning tags/v1.6.6 but much, much faster
url: https://codeload.github.com/ansible/ansible/zip/69d85c22c7475ccf8169b6ec9dee3ee28c92a314
unpack: zip
export: ansible-69d85c22c7475ccf8169b6ec9dee3ee28c92a314

git module ansible_playbooks:
url: https://github.com/snowplow/ansible-playbooks.git
# Comment out to fetch a specific rev instead of master:
# rev: xxx
50 changes: 50 additions & 0 deletions vagrant/up.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash
set -e

vagrant_dir=/vagrant/vagrant
bashrc=/home/vagrant/.bashrc

echo "========================================"
echo "INSTALLING PERU AND ANSIBLE DEPENDENCIES"
echo "----------------------------------------"
apt-get update
apt-get install -y language-pack-en git unzip libyaml-dev python3-pip python-yaml python-paramiko python-jinja2

echo "==============="
echo "INSTALLING PERU"
echo "---------------"
sudo pip3 install peru

echo "======================================="
echo "CLONING ANSIBLE AND PLAYBOOKS WITH PERU"
echo "---------------------------------------"
cd ${vagrant_dir} && peru sync -v
echo "... done"

env_setup=${vagrant_dir}/ansible/hacking/env-setup
hosts=${vagrant_dir}/ansible.hosts

echo "==================="
echo "CONFIGURING ANSIBLE"
echo "-------------------"
touch ${bashrc}
echo "source ${env_setup}" >> ${bashrc}
echo "export ANSIBLE_HOSTS=${hosts}" >> ${bashrc}
echo "... done"

echo "=========================================="
echo "RUNNING PLAYBOOKS WITH ANSIBLE*"
echo "* no output while each playbook is running"
echo "------------------------------------------"
while read pb; do
su - -c "source ${env_setup} && ${vagrant_dir}/ansible/bin/ansible-playbook ${vagrant_dir}/${pb} --connection=local --inventory-file=${hosts}" vagrant
done <${vagrant_dir}/up.playbooks

guidance=${vagrant_dir}/up.guidance

if [ -f ${guidance} ]; then
echo "==========="
echo "PLEASE READ"
echo "-----------"
cat $guidance
fi
4 changes: 4 additions & 0 deletions vagrant/up.guidance
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
To get started:
vagrant ssh
cd /vagrant
sbt test
3 changes: 3 additions & 0 deletions vagrant/up.playbooks
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
oss-playbooks/java7.yml
oss-playbooks/scala.yml
oss-playbooks/sbt.yml

0 comments on commit 1156a3d

Please sign in to comment.