Contains the version of the complete spark 2.1.0 distribution with hdfs 2.7 build and default confiuration and scripts to run spark jobs on the cluster.
In the oc
folder exists an script
to run spark jobs on the cluster (external or minishift).
The script works like this:
bash <job_name> "[spark_parameters] <spark_job> [job_parameters]"
For the purpose of our tests and benchmarks we use the following parameters:
bash \
wordcount \ # Job name
"--master spark://spark-master:7077 \ # Spark master url
--class \ # Main class of the job
--driver-memory 512m \ # Driver memory
--executor-memory 512m \ # Executor memory
--packages org.alluxio:alluxio-core-client:1.4.0 \ # Alluxio client library
http://hdfs-httpfs:14000/webhdfs/v1/jobs/spark-wordcount.jar?op=OPEN& \ # Spark job jar
-i alluxio://alluxio-master:19998/data/ \ # Job parameters
-o alluxio://alluxio-master:19998/data/"