by Marylette Roa, Iris Uy, Isabelle Tingzon, Clau Yagyagan, and the WWCode Manila community
presented during PyConPH 2018 at iAcademy Plaza, Gil Puyat Ave,. Makati City on February 25 2018.
This workshop covers an introduction to data analysis using the programming language Python. It aims to introduce basic concepts, commonly used tools, and workflows to document, describe, analyze, and visualize data. Our targeted audience are those who are new to Python as a tool for data analysis, as well as those who are beginners in data analysis and are looking for tools to get started.
Time: 3 hrs
Format: Follow-along demos & exercises
Reminders:
- Bring your own laptop (or share with a friend)
- Install needed tools
- Download the materials
- Optional: Familiarize your self with Jupyter Notebook
- Recommended: Set up a Plotly account
Requirement: An understanding of the Python programming language is necessary to answer the exercises.
The workshop is divided into the following topics. Please download the materials ahead of the workshop.
- Introduction and Documentation with Jupyter notebook (15 mins)
- Data handling (45 mins)
- Analysis (45 mins)
- Visualization (45 mins)
- Today I Learned (15 mins)
- Q&A (15 mins)
This will take a bit to download and install so make sure to set this up before the workshop starts.
- Python 3.x
- Jupyter notebook
- Pandas
- Scikit-learn
- Seaborn
- Plotly
The easiest way to obtain most of these is to install the latest version of Anaconda. Simply download the corresponding executable for your operating system. For this workshop, please choose the Python 3.6.x version. The installer should guide you through the process.
Once you have installed Anaconda, you also need to install the Python library plotly
for the visualization part:
- Open
Anaconda Prompt
from the Start menu (Windows) or open theTerminal
application (Mac/Linux). - Type the following, which should guide you through the installation process:
conda install plotly
Most of the materials in this workshop are presented using follow-along Jupyter notebooks. These are interactive notebooks which can contain texts, codes, results of codes, figures, and more. You can read more about Jupyter notebook here.
You can also try Jupyter without installing.
However, Anaconda comes with Jupyter notebook! To launch this locally on your computer, simply look for the Jupyter Notebook icon in the Start Menu or Desktop (Windows) or type the following in the command prompt or terminal (Windows, Mac, Linux):
jupyter notebook
The notebook will be launched in a new browser window. You can go ahead and explore the dashboard. More things that you can do is in this guide.
At the start of the workshop, we'll also help you with navigating and using the notebook. But if you want a quick go-over on your own, here's a good YouTube resource. Another way to maximize this tool is to familiarize yourselves with keyboard shortcuts. We'll use the basic shortcuts during the workshop, and we'll let you explore more on your own.
To make finding these notebooks simpler, save the workshop materials in the same disk (e.g C:
) where you installed Anaconda and open Jupyter Notebook by searching for it's icon in the Start menu (Windows), or launch the Jupyter notebook within the folder containing the materials using the Anaconda Prompt
(Windows) or Terminal
(Mac/Linux):
cd /path/to/wwcodemanila/dataworkshop/materials
jupyter notebook
Using the Jupyter dashboard, simply click the folders leading up to the *.ipynb
files.
In order to plot with plotly, it is advisable to set up an account.
We based our materials and exercises, with permission, on Professor Jennifer Widom's short course on Big Data.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License