24-faers.Rmd

# FDA Adverse Event Reporting System (FAERS) {-}

[![Build Status](https://travis-ci.org/asdfree/faers.svg?branch=master)](https://travis-ci.org/asdfree/faers) [![Build status](https://ci.appveyor.com/api/projects/status/github/asdfree/faers?svg=TRUE)](https://ci.appveyor.com/project/ajdamico/faers)

The FDA Adverse Event Reporting System (FAERS) compiles all prescription drug-related side-effects reported by either physicians or patients in the United States. Either party can make a (voluntary) submission to the FDA or the manufacturer (who then must report that event). This is the post-marketing safety surveillance program for drug and therapeutic biological products.

* Multiple tables linkable by the `primaryid` field with patient demographics, drug/biologic information, patient outcomes, reporting source, drug start and end dates.

* Published quarterly with the latest events reported to the FDA since 2004, with a revised system beginning in the fourth quarter of 2012.

* Maintained by the United States [Food and Drug Administration (FDA)](http://www.fda.gov/).

## Simplified Download and Importation {-}

The R `lodown` package easily downloads and imports all available FAERS microdata by simply specifying `"faers"` with an `output_dir =` parameter in the `lodown()` function. Depending on your internet connection and computer processing speed, you might prefer to run this step overnight.

```{r eval = FALSE }
library(lodown)
lodown( "faers" , output_dir = file.path( path.expand( "~" ) , "FAERS" ) )
```

## Analysis Examples with base R \ {-}

Load a data frame:

```{r eval = FALSE }
faers_drug_df <- 
	readRDS( file.path( path.expand( "~" ) , "FAERS" , "2016 q4/drug16q4.rds" ) )

faers_outcome_df <- 
	readRDS( file.path( path.expand( "~" ) , "FAERS" , "2016 q4/outc16q4.rds" ) )

faers_demo_df <- 
	readRDS( file.path( path.expand( "~" ) , "FAERS" , "2016 q4/demo16q4.rds" ) )

faers_df <- merge( faers_drug_df , faers_outcome_df )

faers_df <- merge( faers_df , faers_demo_df , all.x = TRUE )
```

```{r eval = FALSE }

```

### Variable Recoding {-}

Add new columns to the data set:
```{r eval = FALSE }
faers_df <- 
	transform( 
		faers_df , 
		
		physician_reported = as.numeric( occp_cod == "MD" ) ,
		
		init_fda_year = as.numeric( substr( init_fda_dt , 1 , 4 ) )
		
	)
	
```

### Unweighted Counts {-}

Count the unweighted number of records in the table, overall and by groups:
```{r eval = FALSE , results = "hide" }
nrow( faers_df )

table( faers_df[ , "outc_code" ] , useNA = "always" )
```

### Descriptive Statistics {-}

Calculate the mean (average) of a linear variable, overall and by groups:
```{r eval = FALSE , results = "hide" }
mean( faers_df[ , "init_fda_year" ] , na.rm = TRUE )

tapply(
	faers_df[ , "init_fda_year" ] ,
	faers_df[ , "outc_code" ] ,
	mean ,
	na.rm = TRUE 
)
```

Calculate the distribution of a categorical variable, overall and by groups:
```{r eval = FALSE , results = "hide" }
prop.table( table( faers_df[ , "sex" ] ) )

prop.table(
	table( faers_df[ , c( "sex" , "outc_code" ) ] ) ,
	margin = 2
)
```

Calculate the sum of a linear variable, overall and by groups:
```{r eval = FALSE , results = "hide" }
sum( faers_df[ , "init_fda_year" ] , na.rm = TRUE )

tapply(
	faers_df[ , "init_fda_year" ] ,
	faers_df[ , "outc_code" ] ,
	sum ,
	na.rm = TRUE 
)
```

Calculate the median (50th percentile) of a linear variable, overall and by groups:
```{r eval = FALSE , results = "hide" }
quantile( faers_df[ , "init_fda_year" ] , 0.5 , na.rm = TRUE )

tapply(
	faers_df[ , "init_fda_year" ] ,
	faers_df[ , "outc_code" ] ,
	quantile ,
	0.5 ,
	na.rm = TRUE 
)
```

### Subsetting {-}

Limit your `data.frame` to elderly persons:
```{r eval = FALSE , results = "hide" }
sub_faers_df <- subset( faers_df , age_grp == "E" )
```
Calculate the mean (average) of this subset:
```{r eval = FALSE , results = "hide" }
mean( sub_faers_df[ , "init_fda_year" ] , na.rm = TRUE )
```

### Measures of Uncertainty {-}

Calculate the variance, overall and by groups:
```{r eval = FALSE , results = "hide" }
var( faers_df[ , "init_fda_year" ] , na.rm = TRUE )

tapply(
	faers_df[ , "init_fda_year" ] ,
	faers_df[ , "outc_code" ] ,
	var ,
	na.rm = TRUE 
)
```

### Regression Models and Tests of Association {-}

Perform a t-test:
```{r eval = FALSE , results = "hide" }
t.test( init_fda_year ~ physician_reported , faers_df )
```

Perform a chi-squared test of association:
```{r eval = FALSE , results = "hide" }
this_table <- table( faers_df[ , c( "physician_reported" , "sex" ) ] )

chisq.test( this_table )
```

Perform a generalized linear model:
```{r eval = FALSE , results = "hide" }
glm_result <- 
	glm( 
		init_fda_year ~ physician_reported + sex , 
		data = faers_df
	)

summary( glm_result )
```

## Analysis Examples with `dplyr` \ {-}

The R `dplyr` library offers an alternative grammar of data manipulation to base R and SQL syntax. [dplyr](https://github.com/tidyverse/dplyr/) offers many verbs, such as `summarize`, `group_by`, and `mutate`, the convenience of pipe-able functions, and the `tidyverse` style of non-standard evaluation. [This vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html) details the available features. As a starting point for FAERS users, this code replicates previously-presented examples:

```{r eval = FALSE , results = "hide" }
library(dplyr)
faers_tbl <- tbl_df( faers_df )
```
Calculate the mean (average) of a linear variable, overall and by groups:
```{r eval = FALSE , results = "hide" }
faers_tbl %>%
	summarize( mean = mean( init_fda_year , na.rm = TRUE ) )

faers_tbl %>%
	group_by( outc_code ) %>%
	summarize( mean = mean( init_fda_year , na.rm = TRUE ) )
```