Welcome to the "Introduction to Programming, Web Scraping, and Data Cleaning in Python" workshop! The goal of this workshop is to increase data fluency of the Brown community. In the first two weeks of the workshop, we learn Python basics. In the third week, we use web scraping to collect data from the web. Web scraping allows us to automate data collection from websites of varying underlying formats. In the fourth week, attendees learn to clean and process the scraped data using the pandas library.
The live workshop runs for four weeks four hours a day. We have a two hour lecture and a two hour lab session to practice coding.
We use JupyterHub during the workshop. The Brown JupyterHub is designed to provide an environment to run Jupyter Notebooks for Python, Julia, R, and other languages without the need to install any software or packages. The users interact with JupyterHub through a web browser. This service is a collaboration supported by various teams in CIS. To learn more about the hub, please check out the documentation here.
The table below contains links to the materials. The lecture will open the notebook in the hub, the video links to the youtube recording. Just FYI if you follow the workshop asynchronously in June 2020, the video will only be available a day or two after the lecture link is posted.
lecture | video | |
---|---|---|
day 1 - Variables part 1 | here | here |
day 2 - Variables part 2 | here | here |
day 3 - Container types part 1 | here | here |
day 4 - Container types part 2 | here | here |
day 5 - Control flow part 1 | here | here |
day 6 - Control flow part 2 | here | here |
day 7 - Control flow part 3 | here | here |
day 8 - Functions part 1 | here | here |
day 9 - Functions part 2 | here | here |
day 10 - Packages | here | here |
day 11 - Web Scraping, Basics | here | here |
day 12 - Web Scraping, Wikipedia | here | here |
day 13 - Web Scraping, Multi-page Queries | here | here |
day 14 - Pandas, part 1 | here | here |
day 15 - Pandas, part 2 | here | here |
day 16 - Visualization, part 1 | here | here |
day 17 - Visualization, part 2 | here | here |
The instructors are Ashley Lee ([email protected]) and Andras Zsom ([email protected]).
We thank the Data Science Initiative at Brown University for sponsoring the workshop. We are grateful to Isabel Restrepo and Fernando Gelin for assisting with the JupyterHub; Matt Slivinski and Camilo Diaz for helping out during the lectures and afternoon sessions.