missing_values.Rmd

# Handling missing values

```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = TRUE, warning = FALSE)
```

* `drop_na`: drop rows containing missing values.

Create a tibble that contains missing (NA) values:

```{r}
dfwithNA <- tibble(x = c(1, 2, NA, 5), 
                   y = c("a", NA, "b", "c"))
```

Remove rows that contain NA values with `drop_na`:

```{r}
drop_na(dfwithNA)
```

***

* `replace_na`: change NA values to a selected value (one per column):

```{r}
# replace NAs by 0s in column "x", and by "k" in column "y"
replace_na(dfwithNA, 
           list(x=0, y="k"))
```

***

* `complete`: turns implicit missing values into explicit missing values:

```{r}
df <- tibble(
  patient = c("Patient1", "Patient1", "Patient2", "Patient3", "Patient3"),
  treatment = c("A", "B", "A", "A", "B"),
  value1 = 1:5,
  value2 = 4:8
)
```

Here we are missing a row for **Patient2 / Treatment B**: add it and fill in with NA values:

```{r}
complete(df, 
         patient, treatment) # columns to expand
```

If you want implicit missing values to be filled by something else than NA, use the **fill** parameter:

```{r}
# we will fill in missing values with 0s in column "value1", and with NAs in column "value2"
complete(df, 
         patient, treatment, 
         fill=list(value1=0, value2=NA))
```

***

* In practice: what if you have `NA` values, along with <u>empty cells</u> and "customized" missing values?

```{r}
dfwithNA2 <- tibble(col1=c(1, 2, NA, 5, "", 4), 
                    col2=c("a", NA, "b", "c", "d", "missing"))
```

Replace empty cells and "customized" missing values with `NA` with `na_if`:

```{r}
dfwithNA2 %>% 
   mutate(col1=na_if(col1, ""), col2=na_if(col2, "missing"))
```


<center><h4 style="background-color: #a4edff; display: inline-block;">**HANDS-ON**</h4></center>

Back to the `starwars` data set:

* Replace **NA** in `hair_color` with "unknown".
* Remove rows that still contain NA values.

<details>
<summary>
<h5 style="background-color: #a4edff; display: inline-block;">*Answer*</h5>
</summary>

```{r, eval=F}
# Replace NA in `hair_color` with "unknown".
# Remove rows that still contain NA values.
replace_na(starwars, list(hair_color="unknown")) %>%
  drop_na()
```

</details>