A Giter8 template for Scala Spark Projects.
This template will bootstrap a new spark project with everyone's "favourite" wordcount example (modified for stop words). You can then replace the wordcount example as desired, and customize the Spark components your project needs.
To encourage good software development practice, this starts with a project at 100% code coverage (e.g. one test :p), while its expected for this to decrease, we hope you use the provided spark-testing-base library or similar option.
Have g8 installed? You can run it with:
g8 holdenk/sparkProjectTemplate --name=projectname --organization=com.my.org --sparkVersion=2.2.0
Using sbt (0.13.13+) just do
sbt new holdenk/sparkProjectTemplate.g8
First go to the project you created:
cd projectname
You can test locally the example spark job included in this template directly from sbt:
sbt "run inputFile.txt outputFile.txt"
then choose CountingLocalApp
when prompted.
You can also assemble a fat jar (see sbt-assembly for configuration details):
sbt assembly
then submit as usual to your spark cluster :
/path/to/spark-home/bin/spark-submit \
--class <package-name>.CountingApp \
--name the_awesome_app \
--master <master url> \
./target/scala-2.11/<jar name> \
<input file> <output file>
Want to build your application using the Spark Job Server? The spark-jobserver.g8 template can help you get started too.
This project is available under your choice of Apache 2 or CC0 1.0. See https://www.apache.org/licenses/LICENSE-2.0 or https://creativecommons.org/publicdomain/zero/1.0/ respectively. This template is distributed without any warranty.