Summary

In this short white paper we used ONT fastq data obtained after sequencing the ZymoBIOMICS™ Microbial Community Standard ¹.

The corresponding Oxford nanopore gDNA reads data Zymo-PromethION-EVEN-BB-SN was downloaded from the Nanopore GridION and PromethION Mock Microbial Community Data Community Release²

We used the gDNA reads as input to extract In-Silico either the 'full-length' 16S amplicon corresponding to the PCR 27F-U1492R ³, the shorter V3V4 amplicon corresponding to the primer combination 337F-805R ³, or a 'universal' amplicon corresponding to the combination 515FB-U1492Rw ³.

We emit the hypothesis that the gDNA sequencing done on the Zymo standard is unbiased as a matter of 16S content and therefore represents the ideal material to assess the efficiency and specificity of the ONT 16S analysis pipeline⁴.

In-Silico capture is not biased by primer mismatches as a real PCR would be, the captured subsets are therefore probably more diverse than real amplicons would be and should constitute a superseed of the in-vitro truth.

The next figure represents the 16s region with variable domains as coloured blocks and a 2D model from the Microbiome.

taken from ⁵

taken from theMicrobiomeViewer

The next figure shows arrows indicating the location of the 16S primers used here in the context of the E. coli reference genome (NC000913).

Method

The read sets produced by our code were submitted to the ONT 16S analysis pipeline to be classified and allow direct comparison of the three amplicon options at different levels (only Genus is shown but the full data is provided in the 'results' folder).

For comparison, 10% of the reads in each set were used for a second analysis using MetONTIIME ⁶

epi2me results

REM: results shown below were obtained with a minimum abundance cutoff of 1% set on the epi2me interface. The 'counts' are therefore only coming from 'classes' with >1% read support in the data.

27F-U1492R in-silico amplicon (~1.4kb)

Epi2ME genus results for the 27F-U1492R in-silico amplicon: (link)
- 27F: "AGAGTTTGATCMTGGCTCAG"
- 1492Rw: "CGGTWACCTTGTTACGACTT"
- epi2me results

337F-805R in-silico amplicon (~400bps)

Epi2ME genus results for the 337F-805R in-silico amplicon: (link)
- 337F: "GACTCCTACGGGAGGCWGCAG"
- 805R: "GACTACHVGGGTATCTAATCC"
- epi2me results

515FB-U1492Rw in-silico amplicon (~850bps)

Epi2ME genus results for the 515FB-U1492Rw in-silico amplicon: (link)
- 515FB: "GTGYCAGCMGCCGCGGTAA"
- U1492Rw: "CGGTWACCTTGTTACGACTT"
- epi2me results

Comparing the results

ONT results vs Zymo community

The amount of data extracted by the different InSilico PCR runs shows different size distributions. This is probably due to 'reads' that are extracted by BBMap and are either not true 'amplicons' or are not clipped at one end, thereby carrying over genomic sequence outside of the amplicon region.

When plotting the count of classified reads in each run, we see that the quantities are closer to each other with an apparent higher yield for the 337F-805R dataset (~2x more reads). This is likely due to the better match of the primer set with the diversity of targets present in the Zymo population (my simple explanation, not tested by re-extracting data with a different primer-pair).

The expected species composition (%) obtained from the Zymo documentation and our results are as follows (sorted alphabetically and two yeast genomes removed):

Please note that we counted here only classifications contributed by >1% of the reads (epi2me interface default). Since we are studying a defined community, keeping lower classifications would mainly add noise to the counts. This also implies that we loose a substancial number of reads in the process (close to 30% in some sets).

species	Zymo	27F-U1492R	337F-805R	515FB_1492Rw
Bacillus halotolerans	.	8,4%	1,8%	5,2%
Bacillus mojavensis	.	10,1%	24,9%	12,0%
Bacillus subtilis	17,4%	11,6%	2,1%	15,7%
Bacillus vallismortis	.	2,9%	.	3,2%
Enterococcus faecalis	9,9%	10,6%	11,2%	9,6%
Escherichia coli	10,1%	.	1,1%	.
Escherichia fergusonii	.	.	1,8%	.
Lactobacillus fermentum	18,4%	15,3%	15,8%	15,7%
Lactobacillus gastricus	.	1,7%	.	.
Lactobacillus suebicus	.	.	.	3,1%
Listeria innocua	.	2,7%	.	10,7%
Listeria monocytogenes	14,1%	.	.	.
Listeria welshimeri	.	14,9%	16,0%	5,4%
Pseudomonas aeruginosa	4,2%	5,6%	4,1%	4,5%
Salmonella enterica	10,4%	5,2%	6,5%	7,2%
Staphylococcus aureus	15,5%	10,9%	13,9%	7,7%
Staphylococcus petrasii	.	.	0,9%	.

('.' absent; in bold, the closest species hit(s) from this anaysis)

Results at Genus level were obtained by adding up all related species and shown next

genus	Zymo	27F-U1492R	337F-805R	515FB_1492Rw
Bacillus	17,4%	33,1%	28,8%	36,1%
Enterococcus	9,9%	10,6%	11,2%	9,6%
Escherichia	10,1%	.	2,9%	.
Lactobacillus	18,4%	17,0%	15,8%	18,8%
Listeria	14,1%	17,6%	16,0%	16,2%
Pseudomonas	4,2%	5,6%	4,1%	4,5%
Salmonella	10,4%	5,2%	6,5%	7,2%
Staphylococcus	15,5%	10,9%	14,7%	7,7%

('.' absent; in bold, the closest genus hit(s) from this anaysis)

The second PCR (V4) shows most similarity with the expected ratio and Escherichia is still lagging behind and is the main responsible for the difference between theoretical Zymo numbers and numbers from this experiment. Interestingly, such a broad difference is not apparent in the recent paper by Karst et al ⁷

MetONTIIME results vs Zymo

MetONTIIME was recently created to offer an alternative to the ONT epi2me 'black-box' analysis solution. We show here a resuilt from MetONTIIME using the fastq data produced above in order to compare its results to those of ONT. Due to the size of the data, only the first 10% of each read set was used to classify the three amplicons.

We did not invest enough time in using MetONTIIME to allow full comparison to the ONT epi2me method and only did a quick run in order to get comparable data. It is very likely that the tool can produce better results and the fact that it its code is accessible makes it more suited for development and research applications including the possibility to change reference database and adapt the parameters to a given metagenomic environment.

The results of a default classification using out of the box parameters (and the PRJNA33175 reference database as described in the MetONTIIME doculentation) are reported in the table below, sorted by the first amplicon counts (27F-U1492R). (§) Due to time/server constrains, only the first 10% of the date were used to classify with MetONTIIME.

metontiime results derived from the file species_counts.txt.

OTU ID	27F-U1492R (§)	337F-805R (§)	515FB-U1492Rw (§)	Zymo	27F-U1492R (%)	337F-805R (%)	515FB-U1492Rw (%)
Bacillus halotolerans	1335	718	1263	.	15,8%	5,9%	11,9%
Lactobacillus fermentum	1074	1775	1229	18,4%	12,7%	14,6%	11,6%
Salmonella enterica	938	884	1105	10,4%	11,1%	7,3%	10,4%
Enterococcus faecalis	817	1281	887	9,9%	9,6%	10,6%	8,4%
Staphylococcus aureus	650	368	736	15,5%	7,7%	3,0%	6,9%
Escherichia fergusonii	682	322	711	.	8,0%	2,7%	6,7%
Listeria innocua	435	258	665	15,5%	5,1%	2,1%	6,3%
Bacillus mojavensis	331	2165	548	.	3,9%	17,8%	5,2%
Listeria welshimeri	671	333	543	.	7,9%	2,7%	5,1%
Bacillus vallismortis	247	145	501	.	2,9%	1,2%	4,7%
Pseudomonas aeruginosa	319	422	312	4,2%	3,8%	3,5%	2,9%
Bacillus subtilis	185	178	180	17,4%	2,2%	1,5%	1,7%
Listeria seeligeri	132	73	169	.	1,6%	0,6%	1,6%
Bacillus sporothermodurans	1	2	127	.	0,0%	0,0%	1,2%
Escherichia marmotae	105	75	119	.	1,2%	0,6%	1,1%
Staphylococcus petrasii	50	47	116	.	0,6%	0,4%	1,1%
Lactobacillus suebicus	4	1	112	.	0,0%	0,0%	1,1%
Bacillus atrophaeus	4	12	105	.	0,0%	0,1%	1,0%
Shigella sonnei	80	30	99	.	0,9%	0,2%	0,9%
Shigella flexneri	68	370	96	.	0,8%	3,0%	0,9%
Enterococcus italicus	1	1	91	.	0,0%	0,0%	0,9%
Lactobacillus equigenerosi	5	4	67	.	0,1%	0,0%	0,6%
Lactobacillus pentosus	0	0	58	.	0,0%	0,0%	0,5%
Lactobacillus gastricus	1	8	56	.	0,0%	0,1%	0,5%
Enterococcus avium	34	22	47	.	0,4%	0,2%	0,4%
Staphylococcus haemolyticus	28	20	47	.	0,3%	0,2%	0,4%
Staphylococcus hominis	27	25	45	.	0,3%	0,2%	0,4%
Staphylococcus simiae	26	1085	45	.	0,3%	8,9%	0,4%
Escherichia albertii	26	24	44	.	0,3%	0,2%	0,4%
Shigella boydii	19	16	41	.	0,2%	0,1%	0,4%
Bacillus hisashii	0	0	34	.	0,0%	0,0%	0,3%
Bacillus licheniformis	23	25	29	.	0,3%	0,2%	0,3%
Escherichia coli	8	96	26	10,1%	0,1%	0,8%	0,2%
Enterococcus pseudoavium	0	0	24	.	0,0%	0,0%	0,2%
Anaerobacillus macyae	0	0	23	.	0,0%	0,0%	0,2%
Listeria ivanovii	14	1240	23	.	0,2%	10,2%	0,2%
Enterococcus wangshanyuanii	16	9	19	.	0,2%	0,1%	0,2%
Staphylococcus saccharolyticus	19	9	19	.	0,2%	0,1%	0,2%
Bacillus decolorationis	0	0	18	.	0,0%	0,0%	0,2%
Enterococcus mundtii	0	3	18	.	0,0%	0,0%	0,2%
Bacillus berkeleyi	1	0	17	.	0,0%	0,0%	0,2%
Kosakonia sacchari	17	13	17	.	0,2%	0,1%	0,2%
Bacillus humi	1	2	16	.	0,0%	0,0%	0,2%
Bacillus xiamenensis	8	4	15	.	0,1%	0,0%	0,1%
Bacillus isabeliae	10	6	14	.	0,1%	0,0%	0,1%
Enterobacter cloacae	14	31	14	.	0,2%	0,3%	0,1%
Enterococcus sulfureus	3	4	14	.	0,0%	0,0%	0,1%
Listeria marthii	9	3	14	.	0,1%	0,0%	0,1%
Listeria monocytogenes	13	16	14	14,1%	0,2%	0,1%	0,1%
Streptococcus gallinaceus	0	0	14	.	0,0%	0,0%	0,1%
Streptococcus urinalis	0	0	14	.	0,0%	0,0%	0,1%
Enterococcus saccharolyticus	4	4	13	.	0,0%	0,0%	0,1%
Isobaculum melis	9	3	13	.	0,1%	0,0%	0,1%
Bacillus nematocida	10	5	12	.	0,1%	0,0%	0,1%
Planococcus maritimus	2	0	12	.	0,0%	0,0%	0,1%
Streptococcus parasanguinis	0	0	11	.	0,0%	0,0%	0,1%
total	8476	12137	10621	100%	100% (sorted)	100%	100%

('.' absent; in bold, the closest species hit(s) from this anaysis)

The genus table was obtained by summing all species within each genus in the table above

Genus	Sum of 27F-U1492R (§)	Sum of 337F-805R (§)	Sum of 515FB-U1492Rw (§)		27F-U1492R (%)	337F-805R (%)	515FB-U1492Rw (%)
Bacillus	2156	3262	2879	17,4%	25,4%	26,9%	27,1%
Listeria	1274	1923	1428	14,1%	15,0%	15,8%	13,4%
Lactobacillus	1084	1788	1522	18,4%	12,8%	14,7%	14,3%
Salmonella	938	884	1105	10,4%	11,1%	7,3%	10,4%
Enterococcus	875	1324	1113	9,9%	10,3%	10,9%	10,5%
Escherichia	821	517	900	10,1%	9,7%	4,3%	8,5%
Staphylococcus	800	1554	1008	15,5%	9,4%	12,8%	9,5%
Pseudomonas	319	422	312	4,2%	3,8%	3,5%	2,9%
Shigella	167	416	236		2,0%	3,4%	2,2%
Kosakonia	17	13	17		0,2%	0,1%	0,2%
Enterobacter	14	31	14		0,2%	0,3%	0,1%
Isobaculum	9	3	13		0,1%	0,0%	0,1%
Planococcus	2	0	12		0,0%	0,0%	0,1%
Anaerobacillus	0	0	23		0,0%	0,0%	0,2%
Streptococcus	0	0	39		0,0%	0,0%	0,4%
Grand Total	8476	12137	10621		100,0% (sorted)	100,0%	100,0%

(in bold, the closest genus hit(s) from this anaysis)

Genus (level5) barplot generated in the Qiime viewer after removing Unassigned counts in the data

Discussion

Results obtained with this public data show that the ONT epi2me analysis pipeline is relatively robust when comparing three PCR amplicons and returns quasi identical classification down to species level.

MetONTIIME, although slow for such large datasets, due to a number of steps using a single cpu, also classifies the community in a relatively similar way. No quantitative analysis was done here to correlate the two tools since the MetONTIIME run was only done on partial data.

The final composition of the Zymo community using both approaches does not fully match the expected relative abundance of the 8 species spiked into the commercial sample.

Escherichia coli is veru low to absent from the analysis results. The absent reads may have been classified in <1% classes that are not counted in the epi2me or transferred to other species with similar sequences.
Bacillus is represented by four separate species (subtilis, mojavensis, halotolerans, vallismortis) in the data while only expected as the single species 'subtilis' from the Zymo documentation.
Listeria monocytogenes is replaced mainly by Listeria welshimeri in both result sets.

Although we cannot exclude that the classification may be biased by high degree of sequence identity between species due to the database used in the pipeline, we cannot either rule out that the Zymo sample also has issues concerning the proportion of the different genus as suggested in the recent report published by Sze & Schloss ⁸.

This analysis suggests that the 16 pipeline is able to correctly classify the relatively simple Zymo community but suggests that it may be biased in some ways and could make wrong assessments when working with more complex communities.

We also observe similar bias in the results when using the MetONTIIME tool which qualifies as a replacement for the ONT method but could benefit from speedup to better use the available computing resources (we thank the author for his kind help furing tool install and usage).

Conclusion

This analysis was done to evaluate the use of Oxford nanopore genomic sequencing data for 16S metagenomic classification and compare different 16S amplicons. We believe that this is possible but we acknowledge that our analysis is quite superficial and does not include quantitative evaluation of the results and proper comparison of the tools. This 16S sequence extraction is only one example of what insilico PCR could allow and we think that other capture examples could help biologists focus on specific gene(s) or region(s) of interest starting from ONT full-genome data to answer specific questions.

References

1 ZymoBIOMICS™ Microbial Community Standard (Catalog No. D6300) link ↩

2 Ultra-deep, long-read nanopore sequencing of mock microbial community standards Link. ↩

3 16S ribosomal RNA Link. ↩

4 Analysis solutions for nanopore sequencing data link. ↩

5 Sensitivity and correlation of hypervariable regions in 16S rRNA genes in phylogenetic analysis. Yang B et al. Link. ↩

6 Maestri S. et al. A Rapid and Accurate MinION-Based Workflow for Tracking Species Biodiversity in the Field. link. ↩

7 Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers and Nanopore sequencing. Karst, A et al. link. ↩

8 The impact of DNA polymerase and number of rounds of amplification in PCR on 16S rRNA gene sequence data. Marc A Sze & Patrick D Schloss link. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZymoPromethion_even_Results.md

ZymoPromethion_even_Results.md

Summary

Method

epi2me results

27F-U1492R in-silico amplicon (~1.4kb)

337F-805R in-silico amplicon (~400bps)

515FB-U1492Rw in-silico amplicon (~850bps)

Comparing the results

ONT results vs Zymo community

MetONTIIME results vs Zymo

Discussion

Conclusion

References

Files

ZymoPromethion_even_Results.md

Latest commit

History

ZymoPromethion_even_Results.md

File metadata and controls

Summary

Method

epi2me results

27F-U1492R in-silico amplicon (~1.4kb)

337F-805R in-silico amplicon (~400bps)

515FB-U1492Rw in-silico amplicon (~850bps)

Comparing the results

ONT results vs Zymo community

MetONTIIME results vs Zymo

Discussion

Conclusion

References