docker-spark-cluster

Build your own Spark cluster setup in Docker.
A multinode Spark installation where each node of the network runs in its own separated Docker container.
The installation takes care of the Hadoop & Spark configuration, providing:

a debian image with scala and java (scalabase image)
four fully configured Spark nodes running on Hadoop (sparkbase image):
- nodemaster (master node)
- node2 (slave)
- node3 (slave)
- node4 (slave)

Motivation

You can run Spark in a (boring) standalone setup or create your own network to hold a full cluster setup inside Docker instead.
I find the latter much more fun:

you can experiment with a more realistic network setup
tweak nodes configuration
simulate scalability, downtimes and rebalance by adding/removing nodes to the network automagically

There is a Medium article related to this: https://medium.com/@rubenafo/running-a-spark-cluster-setup-in-docker-containers-573c45cceabf

Installation

Clone this repository
cd scalabase
./build.sh # This builds the base java+scala debian container from openjdk9
cd ../spark
./build.sh # This builds sparkbase image
run ./cluster.sh deploy
The script will finish displaying the Hadoop and Spark admin URLs:
- Hadoop info @ nodemaster: http://172.18.1.1:8088/cluster
- Spark info @ nodemaster : http://172.18.1.1:8080/
- DFS Health @ nodemaster : http://172.18.1.1:9870/dfshealth.html

Options

cluster.sh stop   # Stop the cluster
cluster.sh start  # Start the cluster
cluster.sh info   # Shows handy URLs of running cluster

# Warning! This will remove everything from HDFS
cluster.sh deploy # Format the cluster and deploy images again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

docker-spark-cluster

Motivation

Installation

Options

Files

README.md

Latest commit

History

README.md

File metadata and controls

docker-spark-cluster

Motivation

Installation

Options