This example analyze crime data. The goal is to understand which districts of this city are more prone to which crime.
The example load data and manipulate it using spark. The visualization is being done by vegas.
Vegas aims to be the missing MatPlotLib for the Scala and Spark world. Vegas wraps around Vega-Lite but provides syntax more familiar (and type checked) for use within Scala.
The output of this example are:
- csv file
- html visualization file.
Use for the building & code executing:
- jdk 1.8
- Scala 2.11.8
- sbt 0.13.8
run the following command:
./scripts/run_container.bash
Download this csv file.
locate the uber jar & csv file/s in accessibly location for the spark job. e.g. shared storage (s3, hdfs, NFS)/ on each server in the cluster.
Run the following command (Review and set the script parameters.):
./scripts/run.bash
if you are running it via IDE at local mode add the following JVM parameter:
-Dspark.master=local[*]
And add the following args: [input file/dir] [output_dir] [app_name]
e.g. ./src/main/resources/data/Crimes.csv ./target/output/ crime_analyzer
You can find more visualization example here