FounderDietStudy_Data.Rmd

---
title: "Founder Diet Study Data"
author: "Brian Yandell"
date: "`r format(Sys.time(), '%d %B %Y')`"
output: html_document
---

```{r setup, include=FALSE, echo = FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE,
                      comment = "", fig.width = 7, fig.height = 7)
```
 
```{r}
library(tidyverse)
library(readxl)
library(readr)
library(foundr) # github/byandell/foundr
```

Data entry and harmonizing is now done through [DataHarmony.Rmd](https://github.com/byandell/FounderDietStudy/blob/main/DataHarmony.Rmd).

# Physiological trait counts by strain, sex, diet

```{r}
PhysioData <- readRDS("PhysioData.rds")
```

There are `r nrow(distinct(PhysioData, trait))` distinct traits.

Count of mice per strain, sex and condition. All combinations have at least some measurements.

```{r}
PhysioData %>%
  filter(!is.na(value)) %>%
  distinct(strain, animal, sex, condition) %>%
  count(strain, sex, condition) %>%
  unite("sex_condition", sex, condition) %>%
  pivot_wider(names_from = "sex_condition", values_from = "n")
```

Percent missing values across traits

```{r}
PhysioData %>%
  filter(!is.na(value)) %>%
  count(strain, sex, condition) %>%
  unite("sex_condition", sex, condition) %>%
  mutate(n = round(100 * (1 - n / max(n)), 2)) %>%
  pivot_wider(names_from = "sex_condition", values_from = "n")
```

# Plasma metabolite abundance data

```{r}
PlaMet0Data <- readRDS("PlaMet0Data.rds")
```

There are missing data for B6 mice across 3 of the 4 `sex:condition` combinations.

```{r}
PlaMet0Data %>%
  filter(!is.na(value)) %>%
  distinct(strain, animal, sex, condition) %>%
  count(strain, sex, condition) %>%
  unite("sex_condition", sex, condition) %>%
  pivot_wider(names_from = "sex_condition", values_from = "n")
```

# Mouse annotations

The experiment annotation file relates the `number` (identifier for `animal`) to the `sex` and `diet`.

```{r}
links <- read.csv(file.path("data", "RawData", "source.csv"), fill = TRUE)
```

```{r}
annotfile <- linkpath("annot", links)
excel_sheets(annotfile)
annot <- read_excel(annotfile) %>%
  mutate(diet = ifelse(as.character(diet_no) == "200339", "HC_LF", "HF_LC"))
```

```{r}
annot %>%
  distinct(diet, diet_no, diet_description)
```

```{r}
annot %>%
  count(strain, sex, diet) %>%
  unite("sex_diet", sex, diet) %>%
  pivot_wider(names_from = "sex_diet", values_from = "n")
```

There is a separate liver annotation file, which relates `ENTREZID` to `SYMBOL`.

```{r}
liverAnnot <- linkpath("liver_annot", links)
liverAnnot <- read_csv(liverAnnot)
```

# Liver RNAseq

There are two liver-specific files, for the data and the annotation. 
The liver data file contains the the liver mRNA files tied to the `ENTREZID` (which is in the first column but does not have a column name).
The `liverAnnot` table relates `ENTREZID` to `SYMBOL`; these are combined to create `trait`.
In addition, the experiment annotation file (`annot`) relates the `number` (identifier for `animal`) to the `sex` and `diet` (which is changed to `condition` for harmonization).
Note also that the liver data refers to strain `129` as `A129` to not begin with a number;
this has to be corrected.

```{r}
LivRnaData <- readRDS("LivRnaData.rds")
```

Percent missing values

```{r}
LivRnaData %>%
  filter(!is.na(value)) %>%
  count(strain, sex, condition) %>%
  unite("sex_condition", sex, condition) %>%
  mutate(n = round(100 * (1 - n / max(n)), 2)) %>%
  pivot_wider(names_from = "sex_condition", values_from = "n")
```