-
Notifications
You must be signed in to change notification settings - Fork 10
/
lab_vectors.Rmd
372 lines (267 loc) · 12.4 KB
/
lab_vectors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
---
title: "Working with Vectors"
subtitle: "R Programming Foundation for Life Scientists"
output:
bookdown::html_document2:
highlight: textmate
toc: true
toc_float:
collapsed: true
smooth_scroll: true
print: false
toc_depth: 4
number_sections: true
df_print: default
code_folding: none
self_contained: false
keep_md: false
encoding: 'UTF-8'
css: "assets/lab.css"
include:
after_body: assets/footer-lab.html
---
```{r,child="assets/header-lab.Rmd"}
```
# Introduction
There are several different data structures that are commonly used in R. The different data structures can be seen as different ways to organise data. In this exercise we will focus on vectors, which are the base data structure in R, and will also get an overview of the key data types (modes) that are found in R. At the end of this exercise you should know:
- What are the data types commonly used in R.
- What is a vector.
- How to create vectors in an interactive R session.
- How one can use R functions to determine the structure and mode of a vector.
- What basic operators you can find in R
- How to subset vector using both indexes and operators.
- Try some of the built-in functions in R.
## Data types
From the lecture you might remember that all elements in any data structures found in R will be of a certain type (or have a certain mode). The four most commonly used data types in R are: logical, integer and double (collectively called numeric), and character. The names hints at what they are.
- **Logical** = `TRUE` or `FALSE` (or `NA`), can also be abbreviated `T` or `F`
- **Integer** = Numbers that can be represented without fractional component
- **Double** = Numbers with a fractional component, and special values like `Inf`, `-Inf`, and `NaN`
- **Character** = Any string of text. Special characters must be escaped with `\`
Often, the mode of an entry is automatically determined by R. For example if you save the value 5.1 as a variable, R will recognize it as numeric (double). If you instead have a text string like "hello world" it will have the mode character. Below you will also see examples of how you can specify the mode yourself, and not rely on R guessing it the right.
> If you are used to tools like Microsoft Excel or Google Sheets you might think "why do I have to think of data types? Excel does everything for me!" But remember, Excel is also doing guesswork behind the scenes, and they can get it wrong. For e.g. Excel will often automatically remove leading zeros from numerical values, if your values represent post codes or other IDs this could corrupt your data. It's always best to know for sure.
## Vectors in R
Depending on the type of data one needs to store in R different data structures can be used. The four most commonly used data structures in R are: vectors, matrices, data frames, and lists. We will in this exercise work only with vectors.
The most basic data structure in R are vectors. Vectors are 1-dimensional data structures that contain only one type of data (e.g. all entries must have the same mode). To create a vector in R one can use the function `c()` (combine) as seen below. This example will create a vector named `example.vector` with 3 elements in it.
```{r}
example.vector <- c(10, 20, 30)
```
<i class="fas fa-lightbulb"></i> If you need more information about the function `c()` you can always use the built-in manual in R. Typing `?c()` will bring up the documentation for the function `c()`.
Once you have created this vector in R, you can access it by simply typing its name in an interactive session.
```{r}
example.vector
```
The output generate on screen shows the entries in your vector and the 1 in squared brackets indicates what position in the vector the entry to the right of it has. In this case 10 is the first entry of the vector.
If we for some reason only wanted to extract the value 10 from this vector we can use the fact that we know it is the first position to do so.
```{r}
example.vector[1]
```
Since a vector can only contain one data type, all members need to be of the same type. If you try to combine data of different types into the same vector, R will not warn you, but instead coerce (turn) it to the most flexible type (From least to most flexible: Logical, integer, double, character). Hence, adding a number to a logical vector will turn the whole vector to a numeric vector.
To check what data type an object is, run the R built-in function `class()`, with the object as the only parameter.
```{r}
class(example.vector)
```
If you for any reason want to have more information about any object you have stored in your R session the command `str()` is very helpful.
```{r}
str(example.vector)
```
## Basic R operators
As in other programming languages there are a set of basic operators in R.
| Operation | Description | Example | Example Result |
|---|---|---|---|
|`x + y`|Addition|`1 + 3`|`4`|
|`x - y`|Subtraction|`1 - 3`|`-2`|
|`x * y`|Multiplication|`2 * 3`|`6`|
|`x / y`|Division|`1 / 2`|`0.5`|
|`x ^ y`|Exponent|`2 ^ 2`|`4`|
|`x %% y`|Modular arithmetic|`1 %% 2`|`1`|
|`x %/% y`|Integer division|`1 %/% 2`|`0`|
|`x == y`|Test for equality|`1 == 1`|`TRUE`|
|`x <= y`|Test less or equal|`1 <= 1`|`TRUE`|
|`x >= y`|Test for greater or equal|`1 >= 2`|`FALSE`|
|`x && y`|Non-vectorized boolean AND|`T && T`|`TRUE`|
|`x & y`|Vectorized boolean AND|`c(T,F) & c(T,T)`|`TRUE FALSE`|
|`x || y`| Non-vectorized boolean OR|`T || F`|`TRUE`|
|`x | y`|Vectorized boolean OR|`c(T,F) | c(T,T)`|`TRUE TRUE`|
|`!x`|Boolean not|`1 != 2`|`TRUE`|
Besides these, there of course numerous more or less simple functions available in any R session. For example, if we want to add all values in our example.vector that we discussed earlier, we can do that using addition:
```{r}
example.vector[1] + example.vector[2] + example.vector[3]
```
But we can also use the function `sum()` that adds all numeric values inside the vector.
```{r}
sum(example.vector)
```
To learn more about a function use the built in R manual as described earlier. If you do not know the name of a function that you believe should be found in R, use the function `help.search()` or use Google to try and identify the name of the command.
# Exercise
In all exercises on this course it is important that you try to figure out what you expect the result to be before you run the commands in R. You should then verify that this will indeed be the result by running the command in an R session. In case there is a discrepancy between your expectations and the actual output make sure you understand why before you move forward. If you can not figure out how to, or which command to run you can click the button to reveal example code including expected output. Also note that in many cases there are multiple solutions that solve the problem equally well.
## Create and modify vectors
Open R-studio and create two numeric vectors named x and y that are of equal length. Use these vectors to answer the questions below.
```{r}
x <- c(2, 4 ,7)
y <- c(1, 5, 11)
```
1. How many numbers are there in the vector x?
```{r,accordion=TRUE}
length(x)
```
2. How many numbers will x + y generate?
```{r,accordion=TRUE}
length(x + y)
```
3. What is the sum of all values in x?
```{r,accordion=TRUE}
sum(x)
```
4. What is the sum of y times y?
```{r,accordion=TRUE}
sum(y*y)
```
5. What do you get if you add x and y?
```{r,accordion=TRUE}
x + y
```
6. Assign x times 2 to a new vector named z
```{r,accordion=TRUE}
z <- x * 2
```
7. How many numbers will z have, why?
```{r,accordion=TRUE}
length(z)
```
8. Assign the mean of z to a new vector named z.mean and determine the length of z.mean
```{r,accordion=TRUE}
z.mean <- mean(z)
length(z.mean)
```
9. Create a numeric vector with all integers from 5 to 107
```{r,accordion=TRUE}
vec.tmp <- 5:107
vec.tmp
```
10. Create a numeric vector with the same length as the previous one, but only containing the number 3
```{r,accordion=TRUE}
vec.tmp2 <- rep(3, length(vec.tmp))
vec.tmp2
```
11. Create a vector that contain all numbers from 1 to 17, where each number occurs the the same number of times as the number itself e.g. 1, 2, 2, 3, 3, 3...
```{r,accordion=TRUE}
rep(1:17, 1:17)
```
12. What will be the result of the following calculations?
- `c(1, 3, 5) + c(2, 4, 6)`
- `c(1, 3, 5) + c(2, 4, 6, 8)`
- `c(1, 3) - c(2, 4, 6 ,8)`
13. Create two numeric vectors of length 4 and test run all the basic operators (as seen in the table earlier) with these two as arguments. Make sure you understand the output generated by R.
## Modify and subset vectors
Create a new character vector that contains the following words and save it using a suitable name:
`apple, banana, orange, kiwi, potato`.
```{r,accordion=TRUE}
veggies <- c("apple", "banana", "orange", "kiwi", "potato")
```
Do the following on your newly created vector.
1. Select orange from the vector.
```{r,accordion=TRUE}
veggies[3]
```
2. Select all fruits from the vector.
```{r,accordion=TRUE}
veggies[-5]
# or
veggies[1:4]
```
3. Do the same selection as in question 2 without using index positions.
```{r,accordion=TRUE}
veggies[veggies=="apple" | veggies == "banana" | veggies == "orange" | veggies == "kiwi"]
# or
veggies[veggies!="potato"]
# or
veggies[!(veggies %in% c("potato"))]
```
4. Convert the character string to a numeric vector.
```{r,accordion=TRUE}
as.numeric(veggies)
```
5. Create a vector of logic values that can be used to extract every second value from your character vector.
```{r,accordion=TRUE}
selection <- c(FALSE, TRUE, FALSE, TRUE, FALSE)
veggies[selection]
```
Alternative solution, why do this work?
```{r,accordion=TRUE}
selection2 <- c(FALSE, TRUE)
veggies[selection2]
```
6. Add the names a, b, o, k and p to the vector.
```{r,accordion=TRUE}
names(veggies) <- c("a", "b", "o", "k", "p")
```
7. Create a vector containing all the letters in the alphabet (NB! this can be done without having to type all letters). Google is your friend.
```{r,accordion=TRUE}
letters
```
8. Sample 30 values randomly with replacement from your letter vector and convert the character vector to factors. Which of the levels have most entries in the vector?
```{r,accordion=TRUE}
letter.sample <- sample(letters, size = 30, replace = TRUE)
letter.sample <- factor(letter.sample)
summary(letter.sample)
```
9. Extract the letter 14 to 19 from the alphabet vector created previously.
```{r,accordion=TRUE}
letters[14:19]
```
10. Extract all but the last letter.
```{r,accordion=TRUE}
letters[1:length(letters)-1]
# or
letters[-length(letters)]
```
11. Which is the index position of the letter u in the vector?
```{r,accordion=TRUE}
which(letters=="u")
```
12. Create a new vector of length one that holds all the alphabet in a single entry.
```{r,accordion=TRUE}
paste(letters, sep = "", collapse = "")
```
13. Create a numeric vector by sampling 100 numbers from a normal distribution with mean 2 and standard deviation 4. Hint! Check the function `rnorm()`.
```{r,accordion=TRUE}
norm.rand <- rnorm(100, mean = 2, sd = 4)
```
14. How many of the generated values are negative?
```{r,accordion=TRUE}
length(norm.rand[norm.rand<0])
# or
sum(norm.rand<0)
```
15. Calculate the standard deviation, mean, median of your random numbers.
```{r,accordion=TRUE}
sd(norm.rand)
mean(norm.rand)
median(norm.rand)
```
16. Replace the 11th value in your random number vector with NA and calculate the same summary statistics again.
```{r,accordion=TRUE}
norm.rand[11] <- NA
sd(norm.rand, na.rm = TRUE)
mean(norm.rand, na.rm = TRUE)
median(norm.rand, na.rm = TRUE)
```
17. Replace the last position in the vector with the letter L and calculate the same summary statistics.
```{r,accordion=TRUE}
norm.rand[100] <- "L"
sd(norm.rand, na.rm = TRUE)
mean(norm.rand, na.rm = TRUE)
median(norm.rand, na.rm = TRUE)
```
18. In many cases one has data from multiple replicates and different treatments in such cases it can be useful to have names of the type: Geno\_a\_1, Geno\_a\_2, Geno\_a\_3, Geno\_b\_1, Geno\_b\_2…, Geno\_s\_3. Try to create a vector with such names without manually typing it all in.
```{r,accordion=TRUE}
geno <- rep("Geno", 57)
needed.letters <- rep(letters[1:19], 3)
needed.numbers <- rep(1:3, 19)
temp <- paste(geno, needed.letters, needed.numbers, sep = "_")
sort(temp)
# One line solution that avoids need of knowing length(geno) and sorting
# Find s position in alphabet
which(letters == "s")
paste("Geno",rep(letters[1:19],rep(3,19)),1:3,sep="_")
```