Spark in TAP

Apache Spark is a general engine for cluster scale computing. It provides APIs for multiple languages including Python, Scala, and SQL.

Getting started with Spark

The easiest way to get started with Spark on TAP is within a Jupyter notebook, as follows:

Accessing Readme files

The README notebook demonstrates how to create a SparkContext and some simple Spark code.

Readme files in Jupyter Sample

The other example notebooks show how to use Spark dataframes, RDDs, streaming, SQL, and machine learning with K-Means and Linear Regression.

Readme files in Jupyter Sample

More information about Spark is available on the Spark website

###Accessing a terminal from Jupyter 1 From the Jupyter dashboard, select the >New button located in the upper right.

Accessing a Terminal from Jupyter

2 Select >Terminal from the sub menu to open a new terminal within Jupyter.

Jupyter Terminal

You can enter Spark commands (spark-shell, spark-submit, etc.) in the terminal window.