GitHub - rsilvery/mapr_yelp_tutorial: The Zeppelin notebook used in the MapR blog, Modern Python & PySpark Application Development

MapR Yelp Tutorial

The Zeppelin notebook used in the MapR Python blog (insert link later)

Goal: Peruse the Yelp Open Dataset and plot the probability of receiving a particular rating using MatPlotLib,PySpark, SparkSQL, and MapR-DB. Tutorial assumes you’ve already uploaded the JSON dataset from here to your distributed file system and untarred it into the /user/mapr/ directory.

Step 1: Create a Python environment and store it to MapR-FS

Detailed steps for doing this with Condas can be found here. But the overall process is:

Create a Python environment with Pandas and MatPlotLib:

conda create -p mapr_yelp_tutorial/ python=2 pandas matplotlib

Zip this directory up from inside the directory:

cd mapr_yelp_tutorial/
zip -r mapr_yelp_tutorial.zip ./

Store this to MapR-FS

hadoop fs -put mapr_yelp_tutorial.zip /user/mapr/python_envs/

Step 2: Load the MapR Data Science Refinery and specify the Python archive created earlier in the Docker run command or environment variable file:

Set the following variable either in the Docker Run command or in the environment variables file you’re using:
```
ZEPPELIN_ARCHIVE_PYTHON=/user/mapr/python_envs/mapr_yelp_tutorial.zip
```
Log into Zeppelin on specified host and port
Download our demo notebook from this repo and import it into Zeppelin

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MapR_Yelp_Tutorial.json		MapR_Yelp_Tutorial.json
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

rsilvery/mapr_yelp_tutorial

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages