Skip to content

Spark in TAP

Todd Lisonbee edited this page May 24, 2016 · 12 revisions

Apache Spark is a general engine for cluster scale computing. It provides API's for multiple languages including Python, Scala, and SQL.

Getting Started with Spark

The easiest way to get started with Spark on TAP is within a Jupyter notebook.

  1. First, create a Jupyter notebook.

  2. Open Jupyter and navigate to examples/spark/README.ipynb

Accessing Readme files

The README notebook demonstrates how to create a SparkContext and some simple Spark code. The other example notebooks show how to use Spark dataframes, RDD's, streaming, SQL, and machine learning with KMeans and Linear Regression.

Readme files in Jupyter Sample

More information about Spark is available on the Spark website

###Accessing a Terminal from Jupyter

  • From the Jupyter dashboard select the button >New Located in the upper right.

Accessing a Terminal from Jupyter

  • Select >Terminal from the sub menu to open a new terminal within Jupyter.

Jupyter Terminal

Within the terminal Spark commands are available, e.g. spark-shell, spark-submit.

Clone this wiki locally