forked from isi-avbulimen/R-cookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path03-table-manipulation.Rmd
728 lines (446 loc) · 23.1 KB
/
03-table-manipulation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
# Table/Data Frame manipulation {#tables}
```{r include=FALSE}
library(dplyr)
library(readr)
library(reshape2)
library(tidyr)
library(janitor)
# Read in required data using public data.gov extract
road_accidents <- readr::read_rds("data/road_accidents_2017.RDS")
```
This chapter provides an overview of code examples for table or data frame manipulation (a tidyverse data frame is referred to as a tibble).
One of the main things you will have to do in any R project or RAP project will be manipulating the data that you are using in order to get it into the format you require.
One of the main packages used to manipulate data is the {dplyr} package which we recommend and use throughout this book. The {dplyr} package (and others e.g. {tidyr}) are all part of the tidyverse. The tidyverse is a group of packages developed by Hadley Wickham and others and are all designed to work with each other. See https://www.tidyverse.org/ for more info.
**Tidyverse packages and functions can be combined and layered using the pipe operator `%>%`.**
{dplyr} is built to work with **tidy data**. To find out more about tidy data please look at the following link https://r4ds.had.co.nz/tidy-data.html but the general principles are:
1. Each variables must have its own column
2. Each observation must have its own row
3. Each value must have its own cell
## Pivot and reshape tables
There will be two examples for pivoting tables provided:
* The {tidyr} package uses the gather/spread functions and is often used to create tidy data
* The {reshape2} package is also a useful package to pivot tables and has added functionality such as providing totals of columns etc.
We want to have the day of the week variable running along the top so each day of the week is its own column.
```{r, echo = FALSE, results='asis'}
# Create smaller dataset for example
road_accidents_small <- road_accidents %>%
dplyr::group_by(Accident_Severity, Day_of_Week) %>%
dplyr::tally()
knitr::kable(head(road_accidents_small),
caption = "Number of road accidents by accident severity and weekday")
```
**{tidyr} package**
Using the {tidyr} package, gather and spread functions can be used to pivot the table views:
- **gather** makes wide data longer i.e. variables running along the top can be "gathered" into rows running down.
- **spread** makes long data wider i.e. one variable can be spread and run along the top with each value being a variable.
**Note: Hadley Wickham is bringing out new versions of these packages called pivot_longer and pivot_wider which are not yet available on DfT laptops.**
```{r}
# Pivot table using tidyr package
library(tidyr)
road_accidents_weekdays <- road_accidents_small %>%
tidyr::spread(Day_of_Week, n)
```
With the spread function above you need to first specify the variable you want to spread, in this case `Day_of_Week`, and then the variable that will be used to populate the columns (`n`).
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_weekdays),
caption = "Number of road accidents by accident severity and weekday, tidyr::spread")
```
The opposite can also be done using the gather function:
```{r}
# Pivot table using tidyr package
library(tidyr)
road_accidents_gather <- road_accidents_weekdays %>%
tidyr::gather(`1`, `2`, `3`, `4`, `5`, `6`, `7`, key = "weekday", value = "n")
```
To use gather, specify which columns you want to be gathered into one column (in this case the individual weekday columns). The `key` is the name of the gathered column, and the `value` is the name of the values.
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_gather),
caption = "Number of road accidents by accident severity and weekday, tidyr::gather")
```
**{reshape2} package**
Again, this has two functions which can be used to pivot tables:
- **melt** makes wide data longer
- **dcast** makes long data wider
```{r}
# Pivot table using reshape2 package
library(reshape2)
road_accidents_weekdays2 <-
reshape2::dcast(road_accidents_small, Accident_Severity ~
Day_of_Week, value.var = "n")
```
With the `dcast` function above, after stating the name of the data frame, you need to specify the variable(s) you want in long format (multiple variables seperated by "+"), in this case Accident_Severity, and then the wide format variable(s) are put after the tilda (again multiple seperated by "+"). The `value.var` argument specifies which column will be used to populate the new columns, in this case it is n.
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_weekdays2),
caption = "Number of road accidents by accident severity and weekday, reshape2::dcast")
```
If you want to create sums and totals of the tables this can also be done using {reshape2}. For example, taking the original table, we want to pivot it and sum each severity to get the total number of accidents per day.
```{r}
# Pivot table using reshape2 package
library(reshape2)
road_accidents_weekdays3 <-
reshape2::dcast(road_accidents_small, Accident_Severity ~ Day_of_Week,
value.var = "n", sum, margins = "Accident_Severity")
```
In this example, we use the `margins` argument to specify what we want to combine to create totals. So we want to add all the accident severity figures up for each weekday. Before using `margins` you need to specify how the margins are calculated, in this case we want a `sum`. Alternative options are to calculate the length, i.e. the number of rows.
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_weekdays3),
caption = "Number of road accidents by accident severity and weekday plus totals, reshape2::dcast")
```
The opposite can also be done using the `melt` function.
```{r}
# Pivot table using reshape2 package
library(reshape2)
road_accidents_melt <- reshape2::melt(road_accidents_weekdays2, id.vars = "Accident_Severity",
measure.vars = c("1", "2", "3", "4", "5", "6", "7"),
variable.name = "Day_of_Week", value.name = "n")
```
For the `melt` function you need to specify:
`id.vars` = "variables to be kept as columns"
`measure.vars` = c("variables to be created as one column")
`variable.name` = "name of created column using the measure.vars"
`value.name` = "name of value column"
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_melt),
caption = "Number of road accidents by accident severity and weekday, reshape2::melt")
```
## Dropping and selecting columns
Use the {dplyr} select function to both select and drop columns.
**Select columns**
```{r}
road_accidents_4_cols <- road_accidents %>%
dplyr::select(acc_index, Accident_Severity, Date, Police_Force)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_4_cols),
caption = "Four columns from road accidents 2017")
```
**Drop columns**
Note that to drop columns the difference is putting a "-" in front of the variable name.
```{r}
road_accidents_3_cols <- road_accidents_4_cols %>%
dplyr::select(-Police_Force)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_3_cols),
caption = "Three columns from road accidents 2017")
```
## Rename variables
Use the **rename** function from {dplyr} to rename variables where the new variable name is on the left hand side of the **=** equals sign, and the old variable name is on the right hand.
```{r}
road_accidents_rename <- road_accidents_4_cols %>%
dplyr::rename(Date_of_Accident = Date)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_rename),
caption = "Rename date column to Date_of_Accident")
```
## Filtering data
Use the {dplyr} filter function to filter data.
This example filters the data for slight severity accidents (accident severity 3).
```{r}
road_accidents_slight <- road_accidents_4_cols %>%
dplyr::filter(Accident_Severity == 3)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_slight),
caption = "Slight severity road accidents 2017")
```
To filter multiple conditions:
And operator
```{r}
road_accidents_filter <- road_accidents_4_cols %>%
dplyr::filter(Accident_Severity == 3 & Police_Force == 4)
```
Or operator
```{r}
road_accidents_filter2 <- road_accidents_4_cols %>%
dplyr::filter(Accident_Severity == 3 | Accident_Severity == 2)
```
**Note: filtering with characters must be wrapped in "quotation marks" e.g:**
```{r, eval = FALSE}
road_accidents_filter3 <- road_accidents %>%
dplyr::filter(`Local_Authority_(Highway)` == "E09000010")
```
Also note that in the above example the variable is quoted in back ticks (`). This is because some variable names confuse R due to brackets and numbers and need to be wrapped in back ticks so R knows that everything inside the back ticks is a variable name.
## Group data
Use the {dplyr} group_by function to group data. This works in a similar manner to "GROUP BY" in SQL.
The below example groups the data by accident severity and weekday, and creates totals for each group using the "tally" function.
```{r}
# Create grouped data set with counts
road_accidents_small <- road_accidents %>%
dplyr::group_by(Accident_Severity, Day_of_Week) %>%
dplyr::tally()
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_small),
caption = "Road accidents 2017 by accident severity and weekday")
```
## Order data
Use the {dplyr} arrange function to order data. This works in a similar manner to "ORDER BY" in SQL.
This example orders the data by date and number of casualties.
```{r include=FALSE}
road_acc_7 <- road_accidents %>%
dplyr::sample_n(7)
```
```{r}
# Order data by date and number of casualties
road_accidents_ordered <- road_acc_7 %>%
dplyr::select(acc_index, Accident_Severity, Police_Force, Number_of_Casualties, Date) %>%
dplyr::arrange(Date, Number_of_Casualties)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_ordered),
caption = "Road accidents 2017 ordered by date and number of casualties")
```
## Get counts of data
To get counts for groups of data, the {dplyr} tally function can be used in conjunction with the {dplyr} group by function. This groups the data into the required groups and then tallys how many records are in each group.
```{r}
# Create grouped data set with counts
road_accidents_small <- road_accidents %>%
dplyr::group_by(Accident_Severity, Day_of_Week) %>%
dplyr::tally()
```
The above example creates groups by accident severity and weekday and counts how many accidents are in each group (one row equals one accident therefore the tally is counting accidents).
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_small),
caption = "Road accidents 2017 by accident severity and weekday")
```
## Combine tables
When combining data from two tables there are two ways to do this in R:
* Bind the tables by basically either appending the tables on the rows or columns
* Join the tables using the {dplyr} version of SQL joins
**Binding tables**
Binding tables is mainly done to append tables by creating more rows, however tables can also be bound by adding more columns. Although it is recommended to use the {dplyr} join functions to combine columns (see 5.6).
```{r include=FALSE}
library(dplyr)
# create three tables for example
accidents_1 <- dplyr::filter(road_accidents_small, Accident_Severity == 1)
accidents_2 <- dplyr::filter(road_accidents_small, Accident_Severity == 2)
accidents_3 <- dplyr::filter(road_accidents_small, Accident_Severity == 3)
```
Here are three tables, one shows data for accident severity of 1, one for accident severity of 2, and one for accident severity of 3.
```{r, echo = FALSE, results='asis'}
knitr::kable(accidents_1, caption = "Number of fatal road accidents in 2017, by weekday")
knitr::kable(accidents_2, caption = "Number of serious injury road accidents in 2017, by weekday")
knitr::kable(accidents_3, caption = "Number of slight injury road accidents in 2017, by weekday")
```
To combine these tables we can use the bind_rows function from the {dplyr} package. Use bind_rows when you want to append the tables underneath one another to make one longer table, i.e. you want to add more rows.
**Ensure that the column names for each table are exactly the same in each table.**
```{r}
# combine tables using bind_rows
library(dplyr)
all_accidents <- accidents_1 %>%
dplyr::bind_rows(accidents_2, accidents_3)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(all_accidents, caption = "Road accident data 2017, bind_rows")
```
```{r include=FALSE}
library(dplyr)
# create two tables for example
road_acc_1 <- head(dplyr::select(road_accidents, acc_index, Police_Force, Accident_Severity))
road_acc_2 <- head(dplyr::select(road_accidents, acc_index, Date, Day_of_Week))
```
## Joining tables
Joins in R can be done using {dplyr}. This is generally to combine columns of data from two tables:
```{r}
# combine tables using left join
library(dplyr)
all_accidents_cols_join <- road_acc_1 %>%
dplyr::left_join(road_acc_2, by = "acc_index")
```
This uses the same principles as SQL, by specifying what the tables should be joined on using the **by =** argument.
{dplyr} has all the usual SQL joins for example, `inner_join`, `full_join`, `right_join`. All of these are used in the same way as the left join example above.
Another useful join for data manipulation is an `anti_join`. This provides all the data that is not in the joined table. For example, the below snapshot of a table displays road accident totals broken down by accident severity and weekday:
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_small, caption = "Road accident data 2017 by accident severity and weekday"))
```
I am interested in creating two sub-groups of this data, a table for all accidents on a Monday (weekday 2), and all other accidents.
First, I get the **Monday** data using the {dplyr} filter function (see 5.3).
```{r include=FALSE}
library(dplyr)
# create filtered Monday table for example
accidents_monday <- dplyr::filter(road_accidents_small, Day_of_Week == 2)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(accidents_monday, caption = "Road accident data 2017 on a Monday by accident severity"))
```
Then, I can use an `anti-join` to create a table which has all of the data that is not in the above table:
```{r}
# create table of all rows not in the joined table
library(dplyr)
all_accidents_not_monday <- road_accidents_small %>%
dplyr::anti_join(accidents_monday, by = c("Accident_Severity", "Day_of_Week"))
```
The above code takes the initial table we want to get our data from (road_accidents_small) and anti joins accidents_monday. This says, "get all the rows from road_accidents_small that are not in accidents_monday". Again, note the need to specify what the join rows would be joined and compared by.
```{r, echo = FALSE, results='asis'}
knitr::kable(all_accidents_not_monday, caption = "Road accident data 2017 not on a Monday by accident severity")
```
## Select specific columns in a join
Doing a join with {dplyr} will join all columns from both tables, however sometimes not all columns from each table are needed.
Let's look at some previous tables again:
```{r, echo = FALSE, results='asis'}
knitr::kable(road_acc_1, caption = "Police force and accident severity information for accidents")
knitr::kable(road_acc_2, caption = "Date and weekday information for accidents")
```
Let's say we want **acc_index** and **Police_Force** from the first table, and **Date** from the second table.
```{r}
# select specific columns from each table and left join
library(dplyr)
road_acc_3 <- road_acc_1 %>%
dplyr::select(acc_index, Police_Force) %>%
dplyr::left_join(select(road_acc_2, acc_index, Date), by = "acc_index")
```
The above code takes the first table and uses the `select` statement to select the required columns from the first table.
Then within the `left_join` command, to select the data from the second table, you again add the `select` statement.
**Note: you will need to select the joining variable in both tables but this will only appear once**
```{r, echo = FALSE, results='asis'}
knitr::kable(road_acc_3, caption = "Police force and Date information for specific accidents")
```
## Sum rows or columns
These solutions use the base R functions rather than {dplyr}.
### Sum rows
To sum across a row:
```{r}
# sum across a row
road_accidents_weekdays$rowsum <- rowSums(road_accidents_weekdays, na.rm = TRUE)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(road_accidents_weekdays, caption = "Road accidents 2017 by accident severity and weekday")
```
To sum across specific rows:
```{r}
# sum across specific rows
road_accidents_weekdays$alldays <- road_accidents_weekdays$`1` + road_accidents_weekdays$`2`+
road_accidents_weekdays$`3`+ road_accidents_weekdays$`4`+
road_accidents_weekdays$`5`+ road_accidents_weekdays$`6`+
road_accidents_weekdays$`7`
```
```{r, echo = FALSE, results='asis'}
knitr::kable(road_accidents_weekdays[,-9], caption = "Road accidents 2017 by accident severity and weekday")
```
### Sum columns
To sum columns to get totals of each column, ***note this will appear as a console output not in a data object***:
```{r, eval = FALSE}
# sum columns
colSums(road_accidents_weekdays, na.rm = TRUE)
```
To get the totals of each column as a row in the data:
```{r}
# create total column
road_accidents_weekdays <- road_accidents_weekdays %>%
janitor::adorn_totals("row")
```
```{r, echo = FALSE, results='asis'}
knitr::kable(road_accidents_weekdays[,-9:-10], caption = "Road accidents 2017 by accident severity and weekday")
```
{reshape2} can also be used to get column totals when pivoting a table (See 5.1).
## Replace NAs or other values
```{r, include = FALSE}
# Create dataset for example with nas (need to change -1 value to na as this is how NAs are represented in the road accident open data)
# create nas
road_accidents_na <- road_accidents %>%
dplyr::na_if(-1)
# get smaller data set for example
road_accidents_na <- road_accidents_na %>%
head(n = 7) %>%
dplyr::select(acc_index, `1st_Road_Class`, `2nd_Road_Class`, Junction_Control)
```
To replace all NAs in one column (Junction Control column) with a specific value:
```{r}
library (tidyr)
# replace all NAs with value -1
road_accidents_na$Junction_Control <- road_accidents_na$Junction_Control %>%
tidyr::replace_na(-1)
```
**Note: To replace NA with a character the character replacement must be wrapped in "quotation marks"**
To replace all NAs in a data frame or tibble:
```{r}
# replace all NAs with value -1
road_accidents_na <- road_accidents_na %>%
replace(is.na(.), -1)
```
To replace values with NA, specify what value you want to be replaced with NA using the na_if function:
```{r}
# create nas
road_accidents_na <- road_accidents_na %>%
dplyr::na_if(-1)
```
**Note: to only create NAs in a specific column specify the column name in a similar manner to the first example in this section.**
To replace values:
```{r}
# replace 1st_road_class
road_accidents_na <- road_accidents_na %>%
dplyr::mutate(`1st_Road_Class` = dplyr::case_when(`1st_Road_Class` == 3 ~ "A Road",
TRUE ~ as.character(`1st_Road_Class`)))
```
The case_when function is similar to using CASE WHEN in SQL.
The TRUE argument indicates that if the values aren't included in the case_when then they should be whatever is after the tilda (~) i.e. the equivalent of the ELSE statement in SQL.
The "as.character" function says that everything that in `1st_Road_Class` isn't 3 should be kept as it is, this could be replaced by an arbitrary character or value e.g. "Other". This would make everything that is not a 3, coded as "Other".
You can have multiple case_when arguments for multiple values, they just need to be seperated with a comma. Multiple case_when statements for different variables can be layered using the pipe operator `%>%`.
## Reordering rows/columns
### Reordering rows
Rows can be reordered by certain variables using the {dplyr} arrange function with examples in the **4.5 Order data** sub-chapter of this book. This will order the data in ascending order by the variables quoted. To order rows in descending order the ``desc()`` command can be used within the arrange function.
```{r}
# Order data by date and number of casualties
road_accidents_ordered_desc <- road_acc_7 %>%
dplyr::select(acc_index, Accident_Severity, Police_Force, Number_of_Casualties, Date) %>%
dplyr::arrange(desc(Date), Number_of_Casualties)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_ordered_desc),
caption = "Road accidents 2017 ordered by date (descending) and number of casualties")
```
### Reordering columns
Use the {dplyr} select statement to reorder columns, where the order of the variables quoted represents the order of the columns in the table.
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_4_cols),
caption = "Four columns from road accidents 2017")
```
To reorder this table we do:
```{r}
table_reordered <- road_accidents_4_cols %>%
dplyr::select(Accident_Severity, Date, acc_index, Police_Force)
```
## Creating new variables
The {dplyr} mutate function can be used to create new variables based on current variables or other additional information.
For example, to create a new variable which is speed limit in km:
```{r}
road_acc_km <- road_acc_7 %>%
dplyr::mutate(speed_km = Speed_limit * 1.6)
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_acc_km <- dplyr::select(road_acc_km, acc_index, Police_Force, Speed_limit, speed_km)),
caption = "Road accidents by km/h")
```
## Summarising data
The {dplyr} summarise function can be used to summarise data (mean, median, sd, min, max, n_distinct). See https://dplyr.tidyverse.org/reference/summarise.html for more examples.
For example, to get the mean number of accidents for each weekday:
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_small),
caption = "Road accidents 2017, by severity and weekday")
```
The group by function is used with the summarise function to specify what groups the mean will be applied to, in this case weekday.
```{r}
road_acc_mean <- road_accidents_small %>%
dplyr::group_by(Day_of_Week) %>%
dplyr::summarise(mean = mean(n))
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_acc_mean),
caption = "Mean number of accidents in 2017, by weekday")
```
## Look up tables
Aside from importing a separate lookup data file into R, named vectors can be used as lookup tables.
For example, to assign accident severity values with labels, named vectors can be used (**note: numbers must also be in quotation marks**):
```{r}
lookup_severity <- c("1" = "Fatal", "2" = "Serious", "3" = "Slight")
```
To convert the data and create a label variable (**note: the Accident_Severity variable values can be replaced with the lookup values by changing the name of the variable on the left to Accident_Severity**):
```{r}
road_accidents_small$Accident_Severity_label <- lookup_severity[road_accidents_small$Accident_Severity]
```
```{r, echo = FALSE, results='asis'}
knitr::kable(head(road_accidents_small),
caption = "Road accidents 2017, by severity and weekday")
```