-
Notifications
You must be signed in to change notification settings - Fork 20
/
Copy path05-MAW-PIV.Rmd
183 lines (107 loc) · 6.62 KB
/
05-MAW-PIV.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
---
title: "OPEN & REPRODUCIBLE MICROBIOME DATA ANALYSIS SPRING SCHOOL 2018"
author: "Sudarshan"
date: "`r Sys.Date()`"
output: bookdown::gitbook
site: bookdown::bookdown_site
---
# Beta diversity metrics
Beta-diversity: Measures for differences between samples from different groups to identify if there are differences in the overall community composition and structure.
**Load packages and data**
```{r, warning=FALSE, message=FALSE}
library(microbiome) # data analysis and visualisation
library(phyloseq) # also the basis of data object. Data analysis and visualisation
library(RColorBrewer) # nice color options
library(ggpubr) # publication quality figures, based on ggplot2
library(dplyr) # data handling
```
For more information:
[Ordination](http://ordination.okstate.edu/overview.htm).
[Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531).
[Normalisation and data transformation](https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-017-0237-y).
[What is Constrained and Unconstrained Ordination](http://www.davidzeleny.net/anadat-r/doku.php/en:ordination).
[Microbiome Datasets Are Compositional: And This Is Not Optional](https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224/full)
[Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data](http://www.nrcresearchpress.com/doi/full/10.1139/cjm-2015-0821#.WvIJZIiFM2w)
```{r}
# read non rarefied data
ps1 <- readRDS("./phyobjects/ps.ng.tax.rds")
# read rarefied data
ps0.rar.rds <- readRDS("./phyobjects/ps0.rar.rds")
# use print option to see the data saved as phyloseq object.
```
## Phylogenetic beta-diversity metrics
### Unweighted Unifrac
Unweighted Unifrac is based on presence/absence of different taxa and abundance is not important. However, it is sensitive to the sequencing depth. If a sample is sequenced more than the others then it may have many OTUs (most of them unique) consequently affecting the unifrac dissimilarity estimation.
Usually, using subOTU/ASV approaches many singletons/OTUs with very low reads are discarded. If you have you won data you try the following code. For data from NG-tax we will skip this step as we have no singletons.
```{r, eval=FALSE}
# if we remove OTUs that are detected atleast 10 times in 5% of the samples
ps0.rar.filtered <- core(ps0.rar.rds, detection = 10, prevalence = 0.05)
summarize_phyloseq(ps0.rar.filtered)
# we reduce the sparsity considerably.
```
*For data from OTU picking*
Since the data used here consists of different body sites with distinct biological properties, the results of ordination do not change a lot by filtering "rare" OTUs. Once again, knowing the biology of your samples and making choices rationally and documenting them is crucial.
Feel free to use the OTU-picking strategy phyloseq object to investigate yourself.
```{r}
ordu.unwt.uni <- ordinate(ps0.rar.rds, "PCoA", "unifrac", weighted=F)
# check for Eigen values
# barplot(ordu.unwt.uni$values$Eigenvalues[1:10])
unwt.unifrac <- plot_ordination(ps0.rar.rds,
ordu.unwt.uni, color="scientific_name")
unwt.unifrac <- unwt.unifrac + ggtitle("Unweighted UniFrac") + geom_point(size = 2)
unwt.unifrac <- unwt.unifrac + theme_classic() + scale_color_brewer("Location", palette = "Set2")
print(unwt.unifrac)
```
Try repeating the above ordination using non-filtered phyloseq object.
### Weighted Unifrac
Weighted Unifrac will consider the abundances of different taxa.
```{r}
ps1.rel <- microbiome::transform(ps1, "compositional")
ordu.wt.uni <- ordinate(ps1.rel , "PCoA", "unifrac", weighted=T)
# check for Eigen values
# barplot(ordu.unwt.uni$values$Eigenvalues[1:10])
wt.unifrac <- plot_ordination(ps1.rel,
ordu.wt.uni, color="scientific_name")
wt.unifrac <- wt.unifrac + ggtitle("Weighted UniFrac") + geom_point(size = 2)
wt.unifrac <- wt.unifrac + theme_classic() + scale_color_brewer("Location", palette = "Set2")
print(wt.unifrac)
print(wt.unifrac + stat_ellipse())
```
The figure brings forward an important characteristics of microbiome data called the 'Horse-shoe effect'. An investigation and explaination for this can be found in the article by Morton JT., et al. 2017 [Uncovering the Horseshoe Effect in Microbial Analyses](http://msystems.asm.org/content/2/1/e00166-16).
You can repeat this analysis with phyloseq object from OTU-picking approach.
Another important aspect regarding weighted unifrac is its property of having heavier weights for abundant taxa. To detect changes in moderately abundant lineages, an extenstion called generalized (UniFrac distance)(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413390/) has been developed. In this test data, we expect sufficient biological variation in composition between sites and hence, we do not apply GUniFrac.
**To reiterate:** It is crucial to understand the biological features of the samples. Although these are exploratory approaches, it is important to differentiate between biological signal and technical artifacts.
## Population-level Density landscapes
```{r}
p <- plot_landscape(ps1.rel,
"NMDS",
"bray",
col = "scientific_name") +
labs(title = paste("NMDS / Bray-Curtis"))
p <- p + scale_color_brewer(palette = "Dark2")+ scale_fill_gradient(low = "#e0ecf4", high = "#6e016b")
p
```
Bray-Curtis dissimilarity does not consider phylogenetic relationships between ASVs. There are several distance methods and a list can be obtained by typing `?distanceMethodList` in the console pane.
Section on multivariate analysis will be discussed on Day3.
## PERMANOVA
Permutational multivariate analysis of variance [further reading](https://onlinelibrary.wiley.com/doi/10.1002/9781118445112.stat07841)
```{r}
library(vegan)
metadf <- data.frame(sample_data(ps1.rel))
unifrac.dist <- UniFrac(ps1.rel,
weighted = TRUE,
normalized = TRUE,
parallel = FALSE,
fast = TRUE)
permanova <- adonis(unifrac.dist ~ scientific_name, data = metadf)
permanova
```
## Checking the homogeneity condition
Type `?betadisper` in R console for more information.
```{r}
ps.disper <- betadisper(unifrac.dist, metadf$scientific_name)
permutest(ps.disper, pairwise = TRUE)
```
```{r}
sessionInfo()
```