Skip to content
Lawrence edited this page Feb 9, 2016 · 5 revisions

An Overview of possible underlying Notebook systems

Jupyter

Jupyter is a Python based notebook, which allows the usage of multiple language settings, in this case Scala + Spark.

Installation

Jupyter is installed via pip. To install, run the following commands:

# May be required for underlying C code
sudo apt-get install build-essential python-dev   
pip install jupyter

To include Spark, Toree (currently Apache Incubator) needs to be installed.

sudo pip install toree
sudo jupyter toree install

Then the kernel.json needs to be adjusted in /path/to/jupyter/kernels/toree (e.g. /usr/local/share/jupyter/kernels/toree). Set SPARK_HOME, SPARK_OPTS etc. Quite a bit of config can be done here.

Finally, call to start server on localhost:8888:

jupyter notebook 

From there you can select Toree from New to create a new Spark notebook. By default, this is a Scala notebook.

Opinion

Straight forward to install for Python. Additional languages require some other kernels, making it a bit more complex. Using Toree for Spark seems fairly easy, further testing is required for better insight. Jupyter has the bigget user base by far, which is is big plus.

Spark-Notebook

Spark Notebook is a notebook developed for Spark. It's main usage is with Scala, other languages (Python, R, ...) are following.

Installation

The easiest way to install this notebook is to got to the website, configure the required version and download it. Then run the following commands:

tar -xzf spark-notebook-<CONFIG>.tar.gz spark-notebook
cd spark-notebook  # Required for relative conf paths in application
bin/spark-notebook  # This starts the notebook on localhost:9000

All configuration is done in spark-notebook/conf.

Opinion

This seems to be the easiest to install. It comes with out of the box Spark/Hadoop/Hive/Parquet/... as well as Cluster support. It is well maintained and in active development. Spark-Notebook seems to be the best fit for this use case, as it addresses our need excatly. (This however, could be a problem in the future if the need changes.)

Zeppelin

view running instance

Opinion

?