InSilico PCR

This Analysis aims at extracting genomic sequences mimicking 16S amplicon reads from a larger context long read set (gDNA reads or larger amplicons).

The motivation behind this analysis was to simulate 16S long amplicon sequencing on ONT platform using data obtained from the https://github.com/LomanLab/mockcommunity site as a large Promethion fastq archive but any other ONT or Pacbio fastq data can be used with this code (please acknowledge them when you would use this data too).

Data Availability

(reproduced from https://github.com/LomanLab/mockcommunity/edit/master/README.md)

Name	Reads (M)	Yield (G)	FASTQ	Run Folder	Restarts	FAST5
Zymo-PromethION-LOG-BB-SN	35.1	148	fastq.gz	64h run	restarts	download.sh, restarts.tar
Zymo-PromethION-EVEN-BB-SN	36.5	146	fastq.gz	64h run	restarts	download.sh, restarts.tar
Zymo-GridION-LOG-BB-SN	3.7	16	fastq.gz	48h run	n/a	signal.tar
Zymo-GridION-EVEN-BB-SN	3.5	14	fastq.gz	48h run	n/a	signal.tar

Method

The method used to extract sequences between primers was developed by Brian Bushnell based on his BBMap tools and is explained here.

The workflow is as follows:

split the data in smaller fastq bins for speed-up using parallel
search forward primer in all bins using BBMap msa.sh
search reverse primer in all bins using BBMap msa.sh
extract 'matching' regions using BBMAp cutprimers.sh
merge all results and keep only amplicons in a chosen size range (as set in the script)

Usage

REM: Snakemake does not always install well with bioconda and can be removed from the file environment.yaml if you encounter issues. It is not used in the current version of this code anyway and was added here for forward compatibility.

Install the required software using conca or manually based on the list provided in environment.yaml.
Set variables, names, and numeric limits in the top of the InSilico_PCR.sh script (adjust the number of threads to the available cores in your own machine)
Run the script with
- demo data: ./InSilico_PCR.sh -d
- your own read file present in the current folder: ./InSilico_PCR.sh <file_name>.fq.gz

Results obtained with the demo data

the results of a 16S In-Silico PCR experiment are presented in ZymoPromethion_even_Results.
the results of using other databases for the MetONTIIME classification are reported here.

Future plans

This code should and will be changed to a snakemake pipeline in order to be more portable. The config.yaml file is the first step towards this transition.

Please send comments and feedback to [email protected]

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InSilico PCR

Data Availability

Method

Usage

Results obtained with the demo data

Future plans

Please send comments and feedback to [email protected]

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
pictures		pictures
results		results
InSilico_PCR.sh		InSilico_PCR.sh
README.md		README.md
ZymoPromethion_even_Results.md		ZymoPromethion_even_Results.md
config.yaml		config.yaml
environment.yaml		environment.yaml
shotgun_16S.sh		shotgun_16S.sh

Nucleomics-VIB/InSilico_PCR

Folders and files

Latest commit

History

Repository files navigation

InSilico PCR

Data Availability

Method

Usage

Results obtained with the demo data

Future plans

Please send comments and feedback to [email protected]

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages