This is a one-hour beginner's introuction to web scraping, using Python. We'll work through a complete example of scraping a website containing course information from a university, resulting in a dataset of almost 10,000 university courses. We'll focus on the concepts involved in web scraping rather than memorizing Python syntax.
- Why you'd want to scrape data from the web in the first place
- A high-level view of how the web works
- How to make a HTTP request in Python
- How to parse HTML in Python
- Why you need to read the Terms of Service of a website before you scrape any website
Anyone is welcome at this workshop no matter what level their programming is at. That's because we'll focus on the concepts behind web scraping more than the specific syntax. This workshop will be most useful to people who have some familiarity with Python but have never done web scraping before.
It's OK Not To Know! That's our motto at D-Lab. D-Lab is open to researchers and professionals from all disciplines and levels of experience. Ask any questions.
If you spot a problem with these materials, please make an issue describing the problem.
- Geoff Bacon
- Chris Hench