GitHub - BendettaSD-zz/ucsd-pyspark-workshop: PySpark Workshop

PySpark Workshop

Directions to start:

It is easiest to get a standalone cluster for Spark running with Docker. We will be using Spark 2.0.x and jupyter for this example.

Install Docker
Find a Spark 2.0.x docker image to pull (or build one yourself). I like the one found here by Produktion

docker pull produktion/jupyter-pyspark:latest

Clone this repo somewhere memorable

git clone https://github.com/BendettaSD/ucsd-pyspark-workshop.git /path/to/usce-pyspark-workshop

Start your docker container

docker run -d -p 4040:4040 -p 7077:7077 -p 8080:8080 -p 6066:6066 -p 8888:8888 -v /path/to/ucsd-pyspark-workshop:/home/jovyan/work produktion/jupyter-pyspark:latest

Go to [http://localhost:8888] to view you jupyter notebook server. You should see pyspark_introduction.ipynb in the list of available files
Open [http://localhost:4040] for the Spark UI page. Nothing will be visible at the moment

Follow along!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
pyspark_introduction.ipynb		pyspark_introduction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PySpark Workshop

Directions to start:

About

Releases

Packages

Languages

BendettaSD-zz/ucsd-pyspark-workshop

Folders and files

Latest commit

History

Repository files navigation

PySpark Workshop

Directions to start:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages