vr-Rqtl_SARS2.Rmd

---
title: "Rqtl_SARS-CoV-2"
author: "Ellen Risemberg"
date: "9/14/2023"
output: 
  html_document:
    keep_md: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Environment setup

```{r message=FALSE, warning=FALSE}
library(knitr)
library(qtl)
library(MESS)
library(MASS)
library(plyr)
library(ggplot2)
library(extRemes)
source('code-dependencies/qtl_functions.R')
source('code-dependencies/plot_noX.R')
```

ggplot stuff:

```{r}
load_themes()
```

## Load data 

```{r}
SARS2 <- read.cross(format="csv", file='derived_data/rqtl_files/SARS2_CC006xCC044_rqtl.csv', 
                    na.strings=c("-","NA","na","no record"), genotypes=c("AA","AB","BB"), 
                    alleles=c("A","B")) # A = CC006, B = CC044
SARS2 <- jittermap(SARS2)
summary(SARS2)
plotMap(SARS2) 

SARS2$pheno$batch <- as.factor(SARS2$pheno$batch)
```

## Calculate derived measures 

**Calculate percentage of starting weight**

```{r}
SARS2$pheno$pd0 <- 100
SARS2$pheno$pd1 <- (SARS2$pheno$d1/SARS2$pheno$d0)*100
SARS2$pheno$pd2 <- (SARS2$pheno$d2/SARS2$pheno$d0)*100
SARS2$pheno$pd3 <- (SARS2$pheno$d3/SARS2$pheno$d0)*100
SARS2$pheno$pd4 <- (SARS2$pheno$d4/SARS2$pheno$d0)*100
```

**Area under the curve**

Use `MESS::auc` function which calculates the AUC for the weight trajectory, then subtract that from 400 (total area beneath 100% line from 0-4). 

Any weight above 100% would increase the AUC and decrease the area-above-the-curve (so some AACs will be negative).

Example: 

```{r}
plot(c(0,1,2,3,4), SARS2$pheno[1,c('pd0', 'pd1', 'pd2', 'pd3', 'pd4')], 
     xlab = "Day post-infection", ylab = "% of starting weight")
lines(c(0,1,2,3,4), SARS2$pheno[1,c('pd0', 'pd1', 'pd2', 'pd3', 'pd4')])
abline(h=100)
```

Metric calculation:

```{r}
SARS2 <- calc_auc(cross = SARS2, steps = c(0,1,2,3,4), col.name = "weight_aac", 
                  phenos = c('pd0', 'pd1', 'pd2', 'pd3', 'pd4'))

plotPheno(SARS2, pheno = 'weight_aac')
```

## Exploratory data analysis 

How much weight do SARS-infected mice lose on average? 

```{r}
pctloss <- (1-(SARS2$pheno$d4/SARS2$pheno$d0))*100
mean(pctloss)
range(pctloss)
```

Plot weight trajectory with CC044 and CC006 avg trajectory: 

```{r}
wtloss <- cov_trajectory_plot(SARS2, phenos = c('pd0', 'pd1', 'pd2', 'pd3', 'pd4'), 
                              title = "", parent.lty = 2)
wtloss
```

### Remove outliers

From trajectory plot, we can see some questionable trajectories. 

One mouse appears to lose a lot of weight from day 0-1, so that its % starting weight on day 1 is < 85 %. 

```{r}
SARS2$pheno[which(SARS2$pheno[,'pd1'] < 85),]
```
This is clearly a typo, since the weight goes right back up on day 2. Remove d1 measurement for this mouse (CR_RB05_F_1136):

```{r}
SARS2$pheno$d1[which(SARS2$pheno$mouse_ID=='CR_RB05_F_1136')] <- NA
SARS2$pheno$pd1[which(SARS2$pheno$mouse_ID=='CR_RB05_F_1136')] <- NA
SARS2$pheno$weight_aac[which(SARS2$pheno$mouse_ID=='CR_RB05_F_1136')] <- NA # unreliable AAC too 
```

Two more mice have a large dip on day 2, so that % starting weight on d2 is < 80%. 

```{r}
SARS2$pheno[which(SARS2$pheno[,'pd2'] < 80),]
```

Remove d2 measurements for these mice (CR_RB05_M_1105 and CR_RB05_M_0889):

```{r}
SARS2$pheno$d2[which(SARS2$pheno$mouse_ID=='CR_RB05_M_1105')] <- NA
SARS2$pheno$pd2[which(SARS2$pheno$mouse_ID=='CR_RB05_M_1105')] <- NA
SARS2$pheno$weight_aac[which(SARS2$pheno$mouse_ID=='CR_RB05_M_1105')] <- NA # unreliable AAC too 

SARS2$pheno$d2[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0889')] <- NA
SARS2$pheno$pd2[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0889')] <- NA
SARS2$pheno$weight_aac[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0889')] <- NA # unreliable AAC too 
```

Two mice have very steep increase from d3 to d4. % starting weight on d3 is < 90%, and on day 4 is > 100%. 

```{r}
SARS2$pheno[which((SARS2$pheno[,'pd4'] > 100) & SARS2$pheno[,'pd3'] < 90),]
```

Remove d4 measurements for these mice (CR_RB05_M_0936 and CR_RB05_M_0946):

```{r}
SARS2$pheno$d4[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0936')] <- NA
SARS2$pheno$pd4[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0936')] <- NA
SARS2$pheno$weight_aac[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0936')] <- NA # unreliable AAC too 

SARS2$pheno$d4[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0946')] <- NA
SARS2$pheno$pd4[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0946')] <- NA
SARS2$pheno$weight_aac[which(SARS2$pheno$mouse_ID=='CR_RB05_M_0946')] <- NA # unreliable AAC too 
```

One mouse has a very large increase from d3-d4:

```{r}
SARS2$pheno[which(SARS2$pheno[,'pd4'] > 120),]
```

Remove the d3 and d4 measurements for that mouse:

```{r}
SARS2$pheno$d4[which(SARS2$pheno$mouse_ID=='CR_RB05_F_0977')] <- NA
SARS2$pheno$pd4[which(SARS2$pheno$mouse_ID=='CR_RB05_F_0977')] <- NA

SARS2$pheno$d3[which(SARS2$pheno$mouse_ID=='CR_RB05_F_0977')] <- NA
SARS2$pheno$pd3[which(SARS2$pheno$mouse_ID=='CR_RB05_F_0977')] <- NA
SARS2$pheno$weight_aac[which(SARS2$pheno$mouse_ID=='CR_RB05_F_0977')] <- NA # unreliable AAC too 
```


Try trajectory plot again without outliers:

```{r}
wtloss <- cov_trajectory_plot(SARS2, phenos = c('pd0', 'pd1', 'pd2', 'pd3', 'pd4'), 
                              title = "SARS-CoV-2 MA10", ylab = "% of starting weight", parent.lty = 2, ylim = c(72,114))
wtloss
```

Save plot:

```{r}
ensure_directory("figures/SARS2")
png(filename = "figures/SARS2/wt_loss.png", width = 700)
wtloss + bw_big_theme
dev.off()
```

Save for combining with other figures:

```{r}
wtloss <- wtloss + bw_big_theme
saveRDS(wtloss, file = "derived_data/otherRobjects/sars2wtloss.Rdata")
```


How much weight do SARS-infected mice lose on average? 

```{r}
pctloss <- (1-(SARS2$pheno$d4/SARS2$pheno$d0))*100
mean(pctloss, na.rm = TRUE)
range(pctloss, na.rm = TRUE)
```

### Hemorrhage score

Average HS:

```{r}
mean(SARS2$pheno$HS, na.rm = TRUE)
range(SARS2$pheno$HS, na.rm = TRUE)
median(SARS2$pheno$HS, na.rm = TRUE)
```

Plot HS data

```{r}
plotPheno(SARS2, pheno.col = 'HS')
```

## Load models metadata 

```{r}
models <- read.csv(file = "source_data/SARS2-models.csv", na.strings = "")
```

## Transform data 

Before transforming data, save raw phenotypes for plotting later:

```{r}
raw_phenos <- SARS2$pheno
```

Log-transform weight on d1-d4, don't transform d0 (covariate only), AAC (already normal), or HS (categorical). 

```{r}
SARS2$pheno$d1 <- log(SARS2$pheno$d1)
SARS2$pheno$d2 <- log(SARS2$pheno$d2)
SARS2$pheno$d3 <- log(SARS2$pheno$d3)
SARS2$pheno$d4 <- log(SARS2$pheno$d4)
```

## Covariate analysis 

```{r}
anova(aov(d1 ~ d0 + batch + sex, data = SARS2$pheno))
anova(aov(d2 ~ d0 + batch + sex, data = SARS2$pheno))
anova(aov(d3 ~ d0 + batch + sex, data = SARS2$pheno))
anova(aov(d4 ~ d0 + batch + sex, data = SARS2$pheno))
anova(aov(weight_aac ~ batch + sex, data = SARS2$pheno))
anova(aov(HS ~ batch + sex, data = SARS2$pheno))
```

Batch, sex and baseline weight should all be covariates. 

## Define covariates 

```{r}
covar <- cbind(as.numeric(pull.pheno(SARS2, "sex") == "M"),
               as.numeric(pull.pheno(SARS2, "batch") == 15), # batch 14 is baseline
               as.numeric(pull.pheno(SARS2, "batch") == 16),
               as.numeric(pull.pheno(SARS2, "batch") == 17),
               as.numeric(pull.pheno(SARS2, "batch") == 18),
               pull.pheno(SARS2, 'd0'))
colnames(covar) <- c('sex', 'batch15', 'batch16', 'batch17', 'batch18', 'd0')
```

## Single QTL Analysis 

First, run `calc.genoprob()`:

```{r}
SARS2 <- calc.genoprob(SARS2)
```

`models` dataframe has the following info about each phenotype:
* `obj`: name of `scanone` object 
* `perm.obj`: name of `scanoneperm` object 
* `type`: model type (normal, np, 2-part)
* `colname`: column name in input spreadsheet 
* `name`: name for genome scan title
* `abbr`: name for axis labels 

Model types: 
* Lung HS: nonparametric model (categorical variable)
* Weight loss (% weight lost): normal model 
* Weight loss (derived measures): normal model 

```{r}
kable(models)
```

Create models (covariates will be ignored for non-parametric and two-part model):

```{r}
ensure_directory("derived_data/SARS2")
ensure_directory("derived_data/SARS2/mods")
create_models(cross.obj = SARS2, models = models, covar = covar, mod.dir = "derived_data/SARS2/mods/")
```

Load/create permutations:

```{r}
ensure_directory("derived_data/SARS2/perms")
create_perms(cross.obj = SARS2, models = models, perm.dir = "derived_data/SARS2/perms/", perm.Xsp = TRUE)
```


Plot genome scans: 

```{r}
plot_scans(models = models)
```

Save genome scans:

```{r}
ensure_directory("figures")
ensure_directory("figures/SARS2")
ensure_directory("figures/SARS2/scans")
plot_scans(models = models, save = TRUE, save.dir = 'figures/SARS2/scans/')

ensure_directory('figures/SARS2/scans/same_ylim/')
plot_scans(models = models, save = TRUE, ylim = c(0,8.3), save.dir = 'figures/SARS2/scans/same_ylim/')
```


HS scan with modified x-axis for **Fig. 3**:

```{r}
png('figures/SARS2/scans/hs_modX.png', width = 750)
plot_modX(hs, alternate.chrid = TRUE, xlab = "", ylab = "", main = "Congestion score", 
          cex.main = 2, cex.axis = 2, bandcol = "gray90", ylim = c(0, 8.3))
title(ylab = "LOD", line = 2.5, cex.lab = 2)
title(xlab = "Chromosome", cex.lab = 2.3, line = 3.7) # cex.lab was 2
abline(h = summary(hsp)$A, lty=1:2)
dev.off()
```

d3 scan with modified x-axis for **Fig. 3**:

```{r}
png('figures/SARS2/scans/d3_modX.png', width = 750)
plot_modX(d3, alternate.chrid = TRUE, xlab = "", ylab = "", main = "Weight - day 3", 
          cex.main = 2, cex.axis = 2, bandcol = "gray90", ylim = c(0,8.3))
title(ylab = "LOD", line = 2.5, cex.lab = 2)
title(xlab = "Chromosome", cex.lab = 2.3, line = 3.7) # cex.lab was 2
abline(h = summary(d3p)$A, lty=1:2)
dev.off()
```


## Significant QTLs 

Calculate 95% Bayes credible intervals for significant LOD peaks, and create a summary table with data for each LOD peak: marker, chromosome, position, LOD and positions of Bayes credible intervals.

```{r}
peaks <- doc_peaks(models, sig.level = 0.10)
kable(peaks)
```


### Table for manuscript 

Turn into tibble for easier processing:

```{r}
peakstbl <- as_tibble(peaks) %>% arrange(factor(chr, levels = c('9', '7', '12', '15')))
```


### Adjusted P-values

```{r message=FALSE}
peakstbl$adj_pval <- rep(NA, nrow(peakstbl))

for (i in 1:nrow(peakstbl)){
  modname <- peakstbl$model[i]
  permname <- models$perm.obj[which(models$obj == modname)]
  chr <- peakstbl$chr[i]

  # Don't use this method as it can result in p = 0 if no permuted LODs 
  # are greater than the observed LOD 
  # p <- mean(get(permname)$A > summary(get(modname))[as.integer(chr),'lod'])
  
  fitgev <- fevd(as.numeric(get(permname)$A), type = "GEV")
  fitgevsum <- summary.fevd(fitgev)
  
  p <- pevd(q = as.numeric(peakstbl$lod[i]), 
            loc = fitgevsum$par[1], 
            scale = fitgevsum$par[2], 
            shape = fitgevsum$par[3], 
            lower.tail = FALSE)
  
  peakstbl$adj_pval[i] <- p
}

peakstbl$adj_pval <- format(peakstbl$adj_pval, digits = 2, scientific = TRUE)
```


### Genomic location 

Create chr:position(interval) column. First, split up bayes CI column into lower and upper limit. Then concatenate various data to produce column:

```{r}
peakstbl$BayesCIlower <- rep(NA, nrow(peakstbl))
peakstbl$BayesCIupper <- rep(NA, nrow(peakstbl))

for (i in 1:nrow(peakstbl)){
  peakstbl$BayesCIlower[i] <- strsplit(peakstbl$`Bayes CI`[i], split = ' - ')[[1]][1]
  peakstbl$BayesCIupper[i] <- strsplit(peakstbl$`Bayes CI`[i], split = ' - ')[[1]][2]
}

# Then concatenate  
peakstbl$chrint <- paste0(rep('Chr ',nrow(peakstbl)), 
                 peakstbl$chr, 
                 rep(': ',nrow(peakstbl)),
                 trimws(format(round(as.double(peakstbl$pos), 2), nsmall=2)),
                 rep(' (',nrow(peakstbl)),
                 trimws(format(round(as.double(peakstbl$BayesCIlower), 2), nsmall=2)),
                 rep('-',nrow(peakstbl)),
                 trimws(format(round(as.double(peakstbl$BayesCIupper), 2), nsmall=2)),
                 rep(')',nrow(peakstbl)))
```

Use phenotype name instead of model name: 

```{r}
peakstbl$Phenotype <- models$name[match(peakstbl$model, models$obj)]
```


### Unadjusted P-values 

```{r}
peakstbl$unadj_pval <- rep(NA, nrow(peakstbl))

for (i in 1:nrow(peakstbl)){
  marker = peakstbl$marker[i]
  mod = peakstbl$model[i]
  pheno.col = models$colname[models$obj == mod]
  covar_names <- models$cov[models$obj == mod]
  if (covar_names == "") {covar_names <- NULL}
  
  unadj.pval <- get_unadj_pval(SARS2, pheno.col, marker, covar_df = covar, covar_names = covar_names)
  peakstbl$unadj_pval[i] <- unadj.pval
}

peakstbl$unadj_pval <- format(peakstbl$unadj_pval, digits = 3)
```


### QTL names 

```{r}
peakstbl$QTL <- rep(NA, nrow(peakstbl))

qtl_pheno_counts <- peakstbl %>% group_by(chr) %>% dplyr::summarize(n = n())

peakstbl$QTL <- c(rep('HrS43', qtl_pheno_counts$n[qtl_pheno_counts$chr==9]),
                  rep('HrS44', qtl_pheno_counts$n[qtl_pheno_counts$chr==7]),
                  rep('HrS45', qtl_pheno_counts$n[qtl_pheno_counts$chr==12]),
                  rep('HrS47', qtl_pheno_counts$n[qtl_pheno_counts$chr==15]))
```


### Phenotypic variance explained

AKA heritability due to a QTL. Using formula (described in Rqtl guide pg 122)

$$h^2 = \frac{\text{var}\{E(y|g\}}{\text{var}\{y\}}$$

For F2 cross: 

$$h^2 = \frac{2a^2 + d^2}{2a^2 + d^2+4\sigma^2}$$

where $a = \mu_{BB} - \mu_{AA}/2$, $d=\mu_{AB} - (\mu_{AA}+\mu_{BB})/2$, and $\sigma^2$ is the residual variance. 

```{r}
peakstbl$phenotypic.var.expl <- rep(NA, nrow(peakstbl))

for (i in 1:nrow(peakstbl)){
  marker = peakstbl$marker[i]
  mod = peakstbl$model[i]
  pheno.col = models$colname[models$obj == mod]
  covar_names <- models$cov[models$obj == mod]
  if (covar_names == "") {covar_names <- NULL}
  
  h2 <- get_var_expl(SARS2, pheno.col, marker, covar_df = covar, covar_names = covar_names)
  
  peakstbl$phenotypic.var.expl[i] <- h2
}

peakstbl$phenotypic.var.expl <- paste0(as.numeric(format(peakstbl$phenotypic.var.expl, digits=3))*100, '%')
```


### Analysis done, format and print

Remove redundant data, re-order: 

```{r}
table1 <- peakstbl %>% 
  arrange(match(chr, c('9', '7', '12', '15')), as.numeric(adj_pval))

# now that it's sorted by adjusted p-value, add */**/*** (makes it a string so can't sort)
table1$adj_pval <- paste(table1$adj_pval, ifelse(as.numeric(table1$adj_pval) < 0.005, '(***)', 
                                                 ifelse(as.numeric(table1$adj_pval) < 0.05, '(**)', 
                                                        ifelse(as.numeric(table1$adj_pval) < 0.10, '(*)', NA))))

col_order <- c('QTL', 'Phenotype', 'chrint', 'adj_pval', 'phenotypic.var.expl')
table1 <- table1[,col_order]
```

Save to csv: 

```{r}
ensure_directory("results")
write.csv(table1, file = "results/SARS2-QTLsummary.csv", row.names = FALSE)
```


## Multiple QTL analysis 

Are chr7, chr12, chr15, chr18 significant after controlling for chr9?

```{r}
geno7 <- pull.geno(SARS2, 7)[,'SBT072729469']
geno9 <- pull.geno(SARS2, 9)[,'gUNC17242574']
geno12 <- pull.geno(SARS2, 12)[,'S6R121129283']
geno15 <- pull.geno(SARS2, 15)[,'gUNC26048180']

# significant together: 
lmodfull <- lm(d3 ~ d0 + geno9 + geno7 + geno12 + geno15, data = SARS2$pheno)
summary(lmodfull)
```

## Phenotype x genotype plots for markers at significant LOD peaks 

Define QTL names:

```{r}
qtl.map <- data.frame(chr = c(9, 7, 12, 15), 
                      qtl_name = c('HrS43', 'HrS44', 'HrS45', 'HrS47'))
```

Plot PxG

```{r}
plot_pxg(cross.obj = SARS2, cross.type = 'f2', raw.data = raw_phenos, 
         geno.map = list(A = "CC006", B = "CC044"), qtl.map = qtl.map, peaks = peaks, 
         plot.type = 'pxg', theme = rmd_theme)
```


Save PxG plots:

```{r}
ensure_directory("figures/SARS2/pxg")
plot_pxg(SARS2, cross.type = 'f2', raw.data = raw_phenos, 
         geno.map = list(A = "CC006", B = "CC044"), qtl.map = qtl.map, 
         peaks = peaks, plot.type = 'pxg', theme = big_theme, save = TRUE, 
         save.dir = 'figures/SARS2/pxg/')
```

PxG plot for Figure 4:

```{r}
HrS43m <- 'S3N094839317'
p <- pxg(cross = SARS2, pheno = SARS2$pheno$weight_aac, marker = HrS43m,
     geno.map = list(A = "CC006", B = "CC044"), qtl.map = qtl.map,
     title = "SARS-CoV-2 MA10",
     theme = sbs_pxg_theme, ylim = c(-20,60))
```

Save plot:

```{r}
ensure_directory("figures/Fig4")
png("figures/Fig4/sars2-HrS43-aac.png", width = 600)
p
dev.off()
```

Save R object for combining with other plots:

```{r}
saveRDS(p, file = "derived_data/otherRobjects/sars2-pxg-HrS43-aac.Rdata")
```