Skip to content

BendettaSD-zz/ucsd-pyspark-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

PySpark Workshop

Directions to start:

It is easiest to get a standalone cluster for Spark running with Docker. We will be using Spark 2.0.x and jupyter for this example.

  • Install Docker
  • Find a Spark 2.0.x docker image to pull (or build one yourself). I like the one found here by Produktion
docker pull produktion/jupyter-pyspark:latest
  • Clone this repo somewhere memorable
git clone https://github.com/BendettaSD/ucsd-pyspark-workshop.git /path/to/usce-pyspark-workshop
  • Start your docker container
docker run -d -p 4040:4040 -p 7077:7077 -p 8080:8080 -p 6066:6066 -p 8888:8888 -v /path/to/ucsd-pyspark-workshop:/home/jovyan/work produktion/jupyter-pyspark:latest
  • Go to [http://localhost:8888] to view you jupyter notebook server. You should see pyspark_introduction.ipynb in the list of available files
  • Open [http://localhost:4040] for the Spark UI page. Nothing will be visible at the moment

Follow along!

About

PySpark Workshop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published