-
Notifications
You must be signed in to change notification settings - Fork 0
/
week01.qmd
272 lines (178 loc) · 13 KB
/
week01.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
---
title: 'BIO 540: Data Wrangling and Visualization'
author: "Jeremy Van Cleve"
date: 27 08 2024
format:
html:
self-contained: true
---
[github site]: https://github.com/vancleve/BIO540-DWV
[canvas site]: https://uk.instructure.com/courses/2122860
0. I was an R newbie too!
1. Go over syllabus
2. Install R and RStudio
3. What is R?
4. What are Markdown and R Markdown?
# Syllabus
- The syllabus will be located here: <https://github.com/vancleve/BIO540-DWV>.
- This is also where R and R Markdown / Quarto documents will be located. The only files that will be kept on the [Canvas site][canvas site] are the PDF copies of the reference books and the completed labs and figures that you submit.
- I'll update the syllabus and schedule of topics as necessary.
# Using R, RStudio, and Quarto
## Local installation
We will use R, RStudio, and Quarto via a server and will not have to install them locally on our computers. However, if you would like to do so, here are some brief instructions.
First step, download R, RStudio, and Quarto
- To download R, go here: <https://cran.r-project.org/>.
- To download RStudio, go here: <https://posit.co/download/rstudio-desktop/>
- To download Quarto, go here: <https://quarto.org/docs/get-started/>.
If you need directions, this may help: <https://quarto.org/docs/get-started/hello/rstudio.html>. If you already have R and need to update your packages, go here: <https://stat545.com/install.html> (these are a little dated now since the release of Quarto, but should still be helpful).
## Server-based R, RStudio, and Quarto via Posit Cloud
We have access to R, RStudio, and Quarto via the Posit Cloud. Go to the [Canvas site][canvas site] and find the Announcement where I welcome you to the course. There will be a link there to the course content on the Posit Cloud. You should then arrive at a page that looks like this,
![Login screen for Posit Cloud](assets/posit_cloud_login.jpg){width=50%}
If you already have a Posit Cloud account, then login with that. Otherwise, select "Sign Up" and create an account. Once you've created your account, you may have to click on the link to the course content again. Then, you should be able to join the workspace and see this:
![Workspace for the course](assets/welcome_posit_cloud.jpg){width=50%}
Finally, you can create a new project based on the files in the course github repository by selecting "New Project" -> "New Project From Template" and then "BIO540-DWV" as seen below.
![New Project From Template](assets/posit_cloud_new_project_from_template.jpg){width=50%}
You can rename your project "Week 1". For each week, you should repeat this process of creating a new project from the template and then giving the project the requisite week name.
# Running RStudio
## What is RStudio?
RStudio is an Integrated Development Environment (IDE) for the R programming language.
While you can write R code in any text editor you like and then run that code with the R interpreter, there are many things that an IDE can do that help you be more efficient when programming.
@. Syntax highlighting
@. **New** visual editor (WYSIWYM editing: <https://en.wikipedia.org/wiki/WYSIWYM>)
@. File/project organization (see "Files" pane)
@. Examining variables that you've set (see the "Environment" pane)
@. Easily execute code and examine its text output (see "Console") or graphical output (see "Viewer")
@. View help files (see "Help")
@. Installing/updating R packages (see "Tools" menu)
@. Debugging (see "Debug" menu)
@. Projects (see "File" and "Tools")
@. Version control (see "Tools" menu)
When using RStudio, I encourage you to:
1. Play around with different arrangements for the windows/panes
2. **LEARN KEYBOARD SHORTCUTS**. They can make you much, much more efficient.
- Shift+Alt+K (MAC/PC): Keyboard shortcut quick reference.
- Shift+Command/Control+K (MAC/PC): Render (or "knit") current document (i.e., turn in HTML/DOCX/PDF)
- Command/Control+Enter (MAC/PC): Run selected lines
- Shift+Command/Control+Enter: Run current "chunk" (R Notebook only)
- etc
3. Use a Project for all assignments in the course (save them in a single directory or its subdirectories).
## Course files as a project
The lecture notes and problems that are stored on the GitHub site and are already downloaded as part of the template.
#### Updating the BIO540-DWV project from the GitHub repository
If there is additional content uploaded to the GitHub repository that was not in the project when you created it from the template, then you can update the files by going to "Tools"->"Version Control"->"Pull Branches".
## Tidyverse
Many of the packages that packages that we will use have been helpfully collected together into a metapackage called `tidyverse`. Lucky for us, this is already installed in the project template. If we needed to install it, we'd type the following at the console:
```{r}
#| eval: false
install.packages("tidyverse")
```
## RStudio cheatsheet
For a cheatsheet on RStudio, go here:
<https://github.com/rstudio/cheatsheets/raw/main/rstudio-ide.pdf>
# What is R?
## First, there was "S".
- S was designed by John Chambers and others at Bell Labs in the 1970s specifically for data analysis and statistics.
- R was developed in 1991 by **R**oss Ihaka and **R**obert Gentleman and re-implemented much of S after S became licensed software.
## Then there was "R"
- R was open-sourced in 1995 and John Chambers and other statisticians are part of its core development team.
- R is object-oriented (i.e., you can build containers of variables and the like), like Java.
- R is interpreted (i.e., the interpreter turns interprets your text code immediately and runs it), like Python.
- R has a tonne of packages for statistics and biology.
- More on the basics of R next time.
# What are Markdown, R Markdown, and Quarto?
## Markdown: a plain text (i.e., just characters) way of "formatting" text
- Markdown is a kind of "markup" language (e.g., HTML).
- Designed for simplicity and readability
- No need to "view" Markdown to read it easily
- Often used in code documentation
- Increasingly used in full document preparation (journal article, books, etc)
Ok, let's get Markdown!\
![](assets/dancing.gif)
First, this whole document is Markdown, so you can quickly see examples of the following:
- Headings are created with hash symbols `#`
- `#` is the "first" heading
- `##` is the "second" heading, which is smaller (how much smaller? set by the "stylesheet")
- etc
- *Italic text* in encapsulated by one \* or asterisk, **Bold text** by two asterisks. Code can be added in line with backticks, `code`.
- Lists can start with -, \*, or +. (Quarto requires a blank line before the list starts)
- Sublists are indented at minimum to line up with the content of the enclosing list. Use with 4 spaces (tab twice in RStudio) for syntax highlighting.
- Numbered lists can be used too.
1. This is the first element.
2. This is the second.
- Numbered lists can be automatically numbered by using `(@)` as the list marker. In-fact, these lists will be numbered continuously across multiple lists
(@) This is the eleventh element since it continues the `(@)` from above.
(@) This is the twelfth element.
- New paragraphs are separated by blank lines.
Line breaks, without a new paragraph, need two spaces\
in order to be recognized. These can be used in lists too.
- Links use angle brackets \<\>: <https://www.r-project.org/>
- Links with different text use `[text](http://link.to.something)`
- Images use `![image caption](path.to.image)`. Add `{width=50%}` after the `(path.to.image)` to have the image use 50% of its possible size.
E.g., here is an elephant that only takes up 25% of the space.
![elephant](assets/elephant.png){width=25%}
- Tables can be created too:
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
- For more details, see the [Quarto docs](https://quarto.org/docs/authoring/markdown-basics.html). Quarto uses a ["flavor" of Markdown](https://pandoc.org/MANUAL.html#pandocs-markdown) created for [Pandoc](https://pandoc.org/), which is the tool Quarto uses to convert Markdown into all the different document types (e.g., PDF, DOCX, HTML, etc).
# R Markdown and Quarto: mixing text (Markdown) and R code (R)
Ok, lets put some R in this thing.
Make a code "**chunk**"" with three back ticks followed by an r in braces. End the chunk with three back ticks:
```{r}
paste("Hello", "World!")
```
Place code inline with a single back ticks. The first back tick must be followed by an R, like this: `r paste("Hello", "World!")`.
You can control how the **chunks** of R code work in the rendered document by adding options like `#| echo: false` to hide the code when you create the HTML
```{r}
#| echo: false
runif(10)
```
or you can add the option `eval: FALSE` so that the code isn't evaluated (see the `install.packages` lines in this `.qmd`)
You can do plots too. Let's do one with a caption ("This is a cool_plot", which you can see Quarto output, which is useful for debugging) that is just a bunch (1000) of normally distributed numbers (mean=0, std=1):
```{r}
#| fig-cap: This is a cool plot
plot(rnorm(1000, 0, 1))
```
Now, lets do something a little more interesting. We can pull in data from all kinds of places including websites and actively updated databases. For example, here are COVID-19 death numbers for the United States for Kentucky and Tennessee from the [CDC](https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Week-Ending-D/r8kw-7aab/about_data):
```{r}
#| warning: false
library(tidyverse)
library(RSocrata)
# https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Week-Ending-D/r8kw-7aab/about_data
us = read.socrata("https://data.cdc.gov/api/odata/v4/r8kw-7aab")
us |>
filter(state == "Kentucky" | state == "Tennessee") |>
filter(group == "By Week") |>
drop_na(covid_19_deaths) |>
ggplot(aes(x=week_ending_date, y=covid_19_deaths, color=state)) +
geom_line() +
scale_color_manual(values=c("#0033A0", "#FF8200")) +
labs(x="Date", y="COVID-19 Weekly Deaths", color="State") +
theme_classic()
```
More about using R with Markdown via Quarto can be found in the here: <https://quarto.org/docs/computations/r.html>.
### Why Quarto?
You may be asking yourself how this plain text gets mixed or "knitted" together with the R code and output and converted in HTML (or another document format). This is where Quarto comes in. Quarto is also a program that reads the `.qmd` first. It looks for R code blocks, runs them, and then takes that output and knits it together into a Markdown document. Quarto gives that Markdown document to Pandoc, which converts it to the output format of choice (e.g., HTML). This knitting process used to be done by an R package called `rmarkdown`, which is actually still used by Quarto. R Markdown however was focused on R whereas Quarto can be used with other languages such as Python and Julia. Hence, learning how to interact with Quarto can be helpful even if you need to switch to compiling documents with calculations done using those languages. In fact, you can use Quarto to build a document that uses each of those languages in different section if that is useful.
## Reproducible Research
![Table 1 from Alston and Rick (2021) Bull Ecol Soc Am 102(2):e01801. <https://doi.org/10.1002/bes2.1801>](assets/tab1_alston_rick_2021.png)
# Lab ![](assets/beaker.png)
- Create your own Quarto Markdown document ("File"->"New File"->"Quarto Document")
- Give the document a title, author, and date (see [here](https://quarto.org/docs/reference/formats/html.html)).
- Write a paragraph about what kind of data you are thinking of analyzing and visualizing. Include an image with a caption to help with the description
(you may need to upload the image to via the "Upload" button in the "Files" pane).
- Create a list of your top 5 favorite TV shows.
- Take the list and create a table that include the following columns: show name, year it premiered, country it premiered in, your ranking
- As you do the above, use some Markdown elements to structure your document. Include the following:
- headings
- a link
- some bold or italic text
- Make sure you can get the `.qmd` file to render properly in HTML in RStudio. (Select the "Preview in Viewer Pane" option under the gear icon for seeing the HTML)
- Download your files from the server and upload them to Canvas under Week 1. To this, follow the instructions below.
- If you didn't download any images (e.g., only linked to images on the web),
select your `qmd` file in the "Files" pane, select the gear icon "More",
and select "Export". Then upload the `qmd` file to Canvas.
- If you did download an image to the server, you'll need to select the image
and the `qmd` file in the "Files" pane, select the gear icon "More",
and select "Export". This will download a zip file. Upload that zip file to canvas.