Short read pipeline of CIWARS for taxonomy classification of bacterial pathogens and computation of ARG abundance based on rpoB marker gene normalization
- Linux operating system
- conda
git clone https://github.com/muhit-emon/read-pipeline.git cd read-pipeline bash install.sh conda env create -f environment.yml
After installation, a conda environment named read_pipeline will be created.
To activate the environment, run the following command
conda activate read_pipeline
Go inside read-pipeline directory. Download the pathogen DB and non-prokaryote DB compatible with Kraken2 and uncompress them.
wget https://zenodo.org/records/14537567/files/CIWARS_Pathogen_DB.tar.gz tar -zxvf CIWARS_Pathogen_DB.tar.gz rm CIWARS_Pathogen_DB.tar.gz
wget https://zenodo.org/records/14537567/files/non-prokaryote-DB.tar.gz tar -zxvf non-prokaryote-DB.tar.gz rm non-prokaryote-DB.tar.gz
Go inside read-pipeline directory.
To run the short read pipeline on metagenomic paired-end short read data ( * .fastq/ * .fq/ * .fastq.gz/ * .fq.gz), use the following command
nextflow run short-read-pipeline.nf --R1 <absolute/path/to/forward/read/file> --R2 <absolute/path/to/reverse/read/file> --out_fname <prefix of output file name> rm -r work
The command line options for this script (short-read-pipeline.nf) are:
--R1: The absolute path of the fastq file containing forward read sequences
--R2: The absolute path of the fastq file containing reverse read sequences
--out_fname: The prefix of the output file name
With --out_fname S1, output files named S1.k2report, S1_rpoB_ARG_norm.tsv, and S1_drug_wise_rpoB_norm.tsv will be generated inside read-pipeline directory.