Skip to content

Spark in TAP

sharibenko edited this page May 24, 2016 · 12 revisions

Apache Spark is a general engine for cluster scale computing. It provides API's for multiple languages including Python, Scala, and SQL.

Getting Started with Spark

The easiest way to get started with Spark on TAP is within a Jupyter notebook.

  1. First, create a Jupyter notebook.

  2. Open Jupyter and navigate to examples/spark/README.ipynb

Accessing Readme files

The README notebook demonstrates how to create a SparkContext and some simple Spark code. The other example notebooks show how to use Spark dataframes, RDD's, streaming, SQL, and machine learning with KMeans and Linear Regression.

Readme files in Jupyter Sample

More information about Spark is available on the Spark website

###Accessing a Spark CLI from a Jupyter notebook

  • From a Jupyter notebook select the button >New Located in the upper right of the notebook.

Creating a CLI from a Jupyter Notebook

  • Select >Terminal from the sub menu to open a new CLI terminal within the Jupyter notebook.

Jupyter Spark CLI

Clone this wiki locally