-
Notifications
You must be signed in to change notification settings - Fork 2
/
FounderDietStudy_Data.Rmd
125 lines (98 loc) · 3.42 KB
/
FounderDietStudy_Data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: "Founder Diet Study Data"
author: "Brian Yandell"
date: "`r format(Sys.time(), '%d %B %Y')`"
output: html_document
---
```{r setup, include=FALSE, echo = FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE,
comment = "", fig.width = 7, fig.height = 7)
```
```{r}
library(tidyverse)
library(readxl)
library(readr)
library(foundr) # github/byandell/foundr
```
Data entry and harmonizing is now done through [DataHarmony.Rmd](https://github.com/byandell/FounderDietStudy/blob/main/DataHarmony.Rmd).
# Physiological trait counts by strain, sex, diet
```{r}
PhysioData <- readRDS("PhysioData.rds")
```
There are `r nrow(distinct(PhysioData, trait))` distinct traits.
Count of mice per strain, sex and condition. All combinations have at least some measurements.
```{r}
PhysioData %>%
filter(!is.na(value)) %>%
distinct(strain, animal, sex, condition) %>%
count(strain, sex, condition) %>%
unite("sex_condition", sex, condition) %>%
pivot_wider(names_from = "sex_condition", values_from = "n")
```
Percent missing values across traits
```{r}
PhysioData %>%
filter(!is.na(value)) %>%
count(strain, sex, condition) %>%
unite("sex_condition", sex, condition) %>%
mutate(n = round(100 * (1 - n / max(n)), 2)) %>%
pivot_wider(names_from = "sex_condition", values_from = "n")
```
# Plasma metabolite abundance data
```{r}
PlaMet0Data <- readRDS("PlaMet0Data.rds")
```
There are missing data for B6 mice across 3 of the 4 `sex:condition` combinations.
```{r}
PlaMet0Data %>%
filter(!is.na(value)) %>%
distinct(strain, animal, sex, condition) %>%
count(strain, sex, condition) %>%
unite("sex_condition", sex, condition) %>%
pivot_wider(names_from = "sex_condition", values_from = "n")
```
# Mouse annotations
The experiment annotation file relates the `number` (identifier for `animal`) to the `sex` and `diet`.
```{r}
links <- read.csv(file.path("data", "RawData", "source.csv"), fill = TRUE)
```
```{r}
annotfile <- linkpath("annot", links)
excel_sheets(annotfile)
annot <- read_excel(annotfile) %>%
mutate(diet = ifelse(as.character(diet_no) == "200339", "HC_LF", "HF_LC"))
```
```{r}
annot %>%
distinct(diet, diet_no, diet_description)
```
```{r}
annot %>%
count(strain, sex, diet) %>%
unite("sex_diet", sex, diet) %>%
pivot_wider(names_from = "sex_diet", values_from = "n")
```
There is a separate liver annotation file, which relates `ENTREZID` to `SYMBOL`.
```{r}
liverAnnot <- linkpath("liver_annot", links)
liverAnnot <- read_csv(liverAnnot)
```
# Liver RNAseq
There are two liver-specific files, for the data and the annotation.
The liver data file contains the the liver mRNA files tied to the `ENTREZID` (which is in the first column but does not have a column name).
The `liverAnnot` table relates `ENTREZID` to `SYMBOL`; these are combined to create `trait`.
In addition, the experiment annotation file (`annot`) relates the `number` (identifier for `animal`) to the `sex` and `diet` (which is changed to `condition` for harmonization).
Note also that the liver data refers to strain `129` as `A129` to not begin with a number;
this has to be corrected.
```{r}
LivRnaData <- readRDS("LivRnaData.rds")
```
Percent missing values
```{r}
LivRnaData %>%
filter(!is.na(value)) %>%
count(strain, sex, condition) %>%
unite("sex_condition", sex, condition) %>%
mutate(n = round(100 * (1 - n / max(n)), 2)) %>%
pivot_wider(names_from = "sex_condition", values_from = "n")
```