-
Notifications
You must be signed in to change notification settings - Fork 0
/
power-calc.Rmd
52 lines (31 loc) · 2.19 KB
/
power-calc.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
title: "Power Calculation"
description: |
Power calculation for EXPLICA using fake-data simulations.
author: "Sergio Olmos"
date: "`r Sys.Date()`"
output: distill::distill_article
---
## Introduction
We are interested in comparing the levels of three biomarkers in the different expotype categories. I used simulated data sets to repeatedly fit linear regression models for different data generating scenarios. The power to detect a statistically significant difference was estimated by computing the proportion of models that resulted in at least one significant difference.
The following summary statistics were used for generating the simulated data sets:
| Biomarker | Units | Male mean (SD) | Female Mean (SD) | Size | Reference |
|--------------|------------|----------------|------------------|-------|-----------|
| Homocysteine | micromol/L | 14.6 (6.1) | 13.1 (4.6) | 3,025 | |
| ApoB | mg/dL | 113.9 (31.0) | 107.0 (32.1) | 1,501 | |
| hs-CRP | mg/L | 3.19 (5.28) | 3.35 (5.37) | 5,072 | |
## Simulated datasets
Simulated data was generated by the following linear model
$$y_i \sim \text{Normal}(\mu_i, \sigma^2)$$
$$\mu_i = \beta_0 + \beta_1 E_{1i} + \beta_2 E_{2i} + \ldots + \beta_k E_{ki}$$
where $E_{ki}$ is an indicator variable of the expotype category $k = {1, 2, \ldots, K - 1}$ of individual $i$.
## Power calculation
Statistical power was estimated simulating 1,000 simulated datasets for each combination of the following parameters:
* Sample size: 1,000, 900, 800
* Number of expotype categories: 5-8
* Maximum true difference in mean biomarker level
The maximum true difference in this case is set by giving one of the $\beta_1, \ldots, \beta_k$ a given value and an equal or smaller value to the others.
The power to detect a significant difference in at least one of the expotype categories is estimated by fitting the data generating model to each data set and testing the null hypothesis
$$H_0: \mu = \beta_0$$
which is given by the F-test in R's `lm()` function.
The proportion of rejected null hypotheses for each sample size and maximum true difference is reported for each of the three biomarkers.