Skip to content

holdenk/sparkProjectTemplate.g8

Repository files navigation

sparkProjectTemplate

A Giter8 template for Scala Spark Projects.

What this gives you

This template will bootstrap a new spark project with everyone's "favourite" wordcount example (modified for stop words). You can then replace the wordcount example as desired, and customize the Spark components your project needs.

To encourage good software development practice, this starts with a project at 100% code coverage (e.g. one test :p), while its expected for this to decrease, we hope you use the provided spark-testing-base library or similar option.

Creating a new project from this template

Have g8 installed? You can run it with:

g8 holdenk/sparkProjectTemplate --name=projectname --organization=com.my.org --sparkVersion=2.2.0

Using sbt (0.13.13+) just do

sbt new holdenk/sparkProjectTemplate.g8

Executing the created project

First go to the project you created:

cd projectname

You can test locally the example spark job included in this template directly from sbt:

sbt "run inputFile.txt outputFile.txt"

then choose CountingLocalApp when prompted.

You can also assemble a fat jar (see sbt-assembly for configuration details):

sbt assembly

then submit as usual to your spark cluster :

/path/to/spark-home/bin/spark-submit \
  --class <package-name>.CountingApp \
  --name the_awesome_app \
  --master <master url> \
  ./target/scala-2.11/<jar name> \
  <input file> <output file>

Related

Want to build your application using the Spark Job Server? The spark-jobserver.g8 template can help you get started too.

License

This project is available under your choice of Apache 2 or CC0 1.0. See https://www.apache.org/licenses/LICENSE-2.0 or https://creativecommons.org/publicdomain/zero/1.0/ respectively. This template is distributed without any warranty.

Releases

No releases published

Sponsor this project

 

Packages

No packages published