For this assignment, we'll work through the First Python Notebook. which introduces the Jupyter interactive coding environment and basic data wrangling and analysis with pandas.
The tutorial, created by data journalist Ben Welsh of The Los Angeles Times, applies these tools to some California campaign finance data. Welsh provides both written narrative and video tutorials. If you have trouble understanding the written portions, please review the short videos for extra help.
For our class, we'll slightly diverge from the tutorial in terms of setting up a code directory and virtual environment.
Please carefully work through the Setup instructions below, and then follow the instructions for Starting Jupyter. You should ignore all instructions in the tutorial related to setting up/starting a virtual environment, as well as "pip install" of packages. We'll be using our standard DataKit and pipenv
workflow to handle these. Just focus on the core content related to working with Jupyter and pandas.
Here are the sections to work through:
- Chapter 2: Hello Jupyter through Chapter 11: Hello groupby
- Chapter 13: Hello Markdown
- Chapter 15: Hello charts
- Chapter 16: Hello cleaning
We'll depart from the tutorial's setup instructions for virtual environments and project setup. Instead please follow the below workflow. Please pay careful attention to the values entered for DataKit prompts below.
# Navigate to your code directory, e.g.
cd ~/Desktop/code
# Use datakit to create a new project
~/code> datakit project create
Creating project from template: /Users/tumgoren/.cookiecutters/cookiecutter-stanford-progj
Select repo_type:
1 - Assignment
2 - Exercise
3 - Project
Choose from 1, 2, 3 (1, 2, 3) [1]: 2
project_number_or_shortname []: First Python Notebook
project_name [COMM 177P Exercise First Python Notebook]: First Python Notebook
project_short_description [TK]: Tutorial on Jupyter and pandas.
repo_root [first-python-notebook]:
Once the project is created, navigate to the new directory and install all the libraries you'll need for the tutorial (this can take a few minutes).
cd first-python-notebook
pipenv install jupyter pandas matplotlib
Once you've created a project using the above Setup instructions, you can start up Jupyter using the following:
# Navigate to the project if you're not already there
cd ~/Desktop/code/first-python-notebook
# and activate the environment
pipenv shell
# Start jupyter (this should open a
# a new page in your web browser)
jupyter notebook
As always, activate your project environment and use invoke
to save your work to GitHub.
cd ~/Desktop/code/first-python-notebook
pipenv shell
invoke code.push