"Introduction to Data Science" workshop for the Expanding Your Horizons 2017 Conference: http://www.eyhsandiego.org/eyh_conference.aspx
Lynn Waterhouse (https://github.com/WaterLynn)
Jessica Carriere-Garwood (https://github.com/JessCG)
Hao Ye (https://github.com/ha0ye)
Have you ever “liked” something on Facebook? or tweeted something on Twitter? or rated a movie on IMDB? By doing that, you are creating data. Now imagine people all over the world doing the same thing for years – that’s a whole lot of data! And those data are used to answer questions like “Are dog-owners or cat-owners happier?” or “Is Jar Jar Binks the most hated character in Star Wars?” (Spoiler alert: yes.) In this workshop, you’ll learn some cool, cutting-edge skills to use computers and data to answer questions like these. You will also learn how to create impressive visualizations and produce reports that can be shared on the web.
The goal of this workshop is to introduce the principles of programming and data science through hands-on activities. Students will use the programming language, R, to explore real datasets.
The specific aims of the workshop are to train students in the following tasks:
- Issue commands to the computer by writing code and running it within the RStudio editor.
- Learn to produce basic reports with code and output using the R markdown format.
- Load and view simple datasets.
- Construct basic plots and graphs of the data.
- Introduce presenters (what we do, fun facts?)
- Overview of workshop objectives and logistics
- Split girls into pairs on computers
- Signaling if they have questions or need assistance using post-it
- Activity 1 - do basic computations
- Introduce the RStudio interface and open the R markdown demo file
- Students fill out basic info
- Students learn how to execute code and view output
- Students try adding their own calculations
- Activity 2 - examine real datasets
- Introduce loading datasets into R and examining them
- Students repeat for sample datasets
- Students compute summary statistics for datasets
- Activity 3 - make plots and visualizations
- Introduce simple plots using
ggplot2
- Students learn about extending functionality by loading packages
- Students produce basic scatterplots and barplots from the data
- Introduce simple plots using
- Activity 4 - create reports
- Students learn about how R markdown produces a final report
- Students can customize with title and other information
- Free time / challenge activities
- students try making their own novel plots
- plotting different variables
- splitting data points or coloring data points by other variables
- producing different types of plots (give sample visual gallery?)
- students try making their own novel plots
- Closing
- provide link to online repository (for more info and where final reports will be deposited)
- ask students if they would like to submit reports
- solicit feedback from students
We would like to thank Ali Freibott, Sue Lowery, Jenny Prairie, and Darcy Taniguchi for their helpful input on this event.
The "Thanksgiving Dinner" and "Star Wars" survey data come from the FiveThirtyEight repo.
The "Gapminder" data are included in the gapminder R package.
CO_2 data come from NOAA ESRL. Temperature data come from NASA GISS, specifically, the Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies (Global monthly means).
{links to software downloads and other tutorials}
Analysis of Pokemon Go characters: http://blog.revolutionanalytics.com/2016/07/an-analysis-of-pok%C3%A9mon-go-types-created-with-r.html
Analysis of Colors of Flags using R: https://gist.github.com/dsparks/3927280 https://www.r-bloggers.com/distribution-of-colors-by-flag/
Tracking Hurricanes and Mapping with R: https://www.r-bloggers.com/tracking-hurricane-sandy-with-open-data-and-r/
Analysis of words spoken by simpson's characters: http://toddwschneider.com/posts/the-simpsons-by-the-data/