Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bethany's Roadmap #16

Open
3 of 6 tasks
ha0ye opened this issue Apr 18, 2017 · 0 comments
Open
3 of 6 tasks

Bethany's Roadmap #16

ha0ye opened this issue Apr 18, 2017 · 0 comments

Comments

@ha0ye
Copy link
Contributor

ha0ye commented Apr 18, 2017

Objectives

1. Replicate figure 5 A

figure 5
"Fig. 5
Environmental drivers of surface microbial community composition.

(A) Principal coordinate (PC) analysis of surface samples shows that samples are not clearly grouped by their regional origin (top), but rather separated by the local temperatures as shown by the strong correlation (R2: 0.76) between the first PC and temperature (bottom). "

2. Replicate figure 5 B
"(B) Pairwise comparisons of environmental factors are shown, with a color gradient denoting Spearman’s correlation coefficients. Taxonomic [based on two independent methods: mitags (12) and mOTUs (13)] and functional (based on biochemical KEGG modules) community composition was related to each environmental factor by partial (geographic distance–corrected) Mantel tests. Edge width corresponds to the Mantel’s r statistic for the corresponding distance correlations, and edge color denotes the statistical significance based on 9,999 permutations."

3. Replicate figure 6

Figure 6
"Fig. 6
Temperature as main environmental driver for microbial community composition in the epipelagic layer.
(A) The strength of association between (meta)genomic and environmental data was tested by statistical models that were first generated with a subset of data for training and then validated on the remaining data. The prediction accuracy was used as a measure for the strength of association. Models that were trained on subsets of taxonomic data from surface water (SRF) samples could predict with high accuracy temperature and dissolved oxygen of samples used for validation (left). Models trained with subsets of taxonomic data from deep chlorophyll maximum (DCM) samples could predict temperature with high accuracy, but could predict dissolved oxygen with only moderate accuracy (middle). To demonstrate across-depth conservation of associations, we show that models trained on data from SRF samples could highly predict temperature, but failed to predict dissolved oxygen in DCM samples. (B) To illustrate prediction accuracy, and thus, strength of association between taxonomic composition (using 16S mitag abundances) and temperature, we show that in situ measured temperature could be predicted with 86% explained variance. The red diagonal shows the theoretical curve for perfect predictions. Sanger sequencing reads from the GOS project were used to calculate relative genus abundance tables. Using temperature prediction models trained at genus level using Tara Oceans data, we show (inset) that the results could be validated at relatively high accuracy given the large differences in sampling and sequencing methods between these two studies."

Tasks

Focus on Figure 5A to start

  • Get environmental metadata (associated with paper)
  • Get OTU data (not easily accessible)
    We need mOTU relative abundances
  • Revisit PCA notes from stats class
  • Make R script to replicate PCA methods as described in Sungawa et. al.
  • Make R script to correlate PC1 with temperature
  • Compare my results to published results
@ha0ye ha0ye closed this as completed Apr 19, 2017
@ha0ye ha0ye reopened this Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant