Nasty_Metagenomes

Antimicrobial Resistance Characterization in Metagenomes

Previous Hackathon Work

NastyBugs
Bugs_And_Drugs (Based on MagicBlast)

Goals:

Compare alignment of FASTQ files 1) MagicBlast 2) HMM-er and 3) MASH using reference AMR Finder database
Generate a "hits" file with aligned reads to each of the publically available AMR databases
Determine species and plasmid contextual information about AMR genes by creating a merged chromosome and plasmid database and BLAST-ting "hits"

Dependencies & Tools

MagicBlast
HHM-er
MASH
SamTools
Skesa
Nextflow
Docker
Miniconda
Biopython

AMR Database

AMR Finder, which includes:

Workflow Diagram

Workflow Steps

Input: SRA ID
Output files: AMR hits file, AMR by species, AMR's on plasmids

Create Blast Databases
Use SamTools
Input SRA/FASTQ to MagicBlast or HHM-er
SKESA guided assembly
Species identification, plasmid identification & Alignment and Statistics

Step 1.

------------------

# Download [RefSeq Plasmid Database](https://www.ncbi.nlm.nih.gov/refseq/)
## Use [FTP](ftp://ftp.ncbi.nlm.nih.gov/refseq/release/plasmid/) to download plasmid database, and concatinate into one file

# Download [AMR Finder Database](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA313047)
## Use webserver to download database [AMR_CDS](ftp://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinder/data/2019-04-29.1/)

# Download [bacterial chromosome Databases](https://www.ncbi.nlm.nih.gov/assembly)
## Search assemblies all[sb]
## Download Assembly: Bacteria, Latest RefSeq, Assembly from Type
## Download Assembly: Bacteria, Latest RefSeq, Reference

# Merge Plasmid, Assembly from Type, Reference Databses
## Create non-redudant list of accession numbers

# Create Blast Databases
## AMR blast db
## Merged Plasmid, Assembly from Type, Reference
  
#**command line**
/opt/ncbi-blast-2.9.0+/bin/makeblastdb -in /data/DBs/Bacteria_type_rep_plasmid_cat_nr.fa -parse_seqids -input_type fasta -dbtype nucl -out /data/DBs/Bacteria_type_rep_plasmid_refseq_nr.blastdb -max_file_sz 4GB 
# Create Mash sketches:
## mash sketch -i AMR.fa
## mash sketch -i -p 12 Bacteria_rep.fna
## mash sketch -i -p 12 Bacteria_typ.fna

(Back to Workflow)

Step 2.

------------------

# Use Sam tools

#**command line**

(Back to Workflow)

Step 3.

------------------

# MagicBlast
# Use AMR finder database as reference
# Use SamTools to sort and create depth summary file
# Run Cov_dep_cal.pl for coverage depth and average sequence coverage
  
#**command line**
/opt/magicblast/ncbi-magicblast-1.4.0/bin/magicblast -sra ERR1600439 -db /data/AMR_CDS.blastdb -outfmt sam -out ERR1600439_v_AMR_CDS_magicblast_sam.out -num_threads 8 -paired -no_unaligned
  
# HHM-er
# Use AMR finder database as a reference; use a user-specified bitscore as a threshold
# to filter hmm hits.
# Translate each read into protein in all six possible reading frames
# Break translated reads into ORFs; discard any length 25 aa or less
# Use hmmsearch to run 562 HMM profiles against the translated reads
# For each hmm profile that scored a hit, extract a representative nucleotide sequence
# Output the representative nucleotide sequences for guided assembly
  
#**command line**
./hmm_pipeline.sh -q [fastq file path] -h [path to hmm_databases directory] -a 
[threshold bitscore]

# MASH
# Given a fasta file with AMR genes we build a MASH sketch and screen the reads against the sketch. 
# This produces kmer distances between the read set and each AMR, which then is used to extract only 
# the AMRs that are close the readset.

#**command line**
# Screen the reads against the AMR sequences with the minimum score 0.85:
## mash screen -p 12 -w AMR.fa.msh ERR1600439*.fastq | awk '$1>0.85' > ERR1600439.amr.screen
# produce a list of candidate AMRs:
## cut -f 5 ERR1600439.amr.screen

(Back to Workflow)

Step 4.

------------------

# SKESA Guided Assembly
# Guided assembly allows to assemble contigs based on some known sequences used as baits. The assembler stacks kmers and extends the ends of each guide sequence optionally output a list of variants assembled. 
# In our analysis we use AMR fasta sequences selected in previous steps as guides for assembly. The resutls are presented as contigs fasta file. 

#**command line**
# to assemble contigs with AMR_CDS_by_ERR1600439_ref.fasta as guides:
# guidedassembler --cores 8 --sra_run ERR1600439 --targets /data/ERR1600439/magicblast_output/AMR_CDS_by_ERR1600439_ref.fasta --contigs_out ERR1600439.ga.fa --fraction 0.1
# to assemble contigs and print out all variants of contigs:
# guidedassembler_graph --targets ../../AMR_CDS_norm.fasta --consensus ERR1600439.amr.contigs.fa --all_variants ERR1600439.amr.all-contigs.fa --gfa /dev/null --sra_run ERR1600439

(Back to Workflow)

Step 5A.

------------------

# Species and Plasmid Identification
# Blast AMR hits lists against combined database
# Parse for Species level and Plasmid identification

#**command line**
sudo /opt/ncbi-blast-2.9.0+/bin/blastn -query /data/ERR1600439/magicblast_output/ERR1600439.ga.fa -task blastn -db /data/DBs/Bacteria_type_rep_plasmid_refseq_nr.blastdb -outfmt 6 -evalue 1e-6 -out /data/ERR1600439/magicblast_output/ERR1600439.ga.fa_vs_Bacteria_RefSeq_nr_blastn.out -max_target_seqs 10 -num_threads 8

(Back to Workflow)

Step 5B.

------------------

# Alignment and stats
## build blast db for contigs
## align reads onto the contigs

#**command line**
# makeblastdb -parse_seqids -in ERR1600439.ga.fa -input_type fasta -dbtype nucl -out ERR1600439.ga.blastdb
# magicblast -db ERR1600439.ga.blastdb -query ERR1600439*.fastq  -infmt fastq | samtools view -Sb -F 4 - | samtools sort - > ERR1600439_amr_contigs.bam

(Back to Workflow)

Authors

Xin Huang
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20851
Inês Mendes
- Instituto de Microbiologia, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisboa, Portugal; University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, The Netherlands
Jonathan Parkinson
- Qpex Biopharma, Inc., San Diego, CA 92121
Samantha Sevilla
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Institutes of Health, Leidos Biomedical, Inc., Gaithersburg, MD 20877
Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Nasty_Metagenomes

Previous Hackathon Work

Goals:

Dependencies & Tools

AMR Database

Workflow Diagram

Workflow Steps

Step 1.

------------------

(Back to Workflow)

Step 2.

------------------

(Back to Workflow)

Step 3.

------------------

(Back to Workflow)

Step 4.

------------------

(Back to Workflow)

Step 5A.

------------------

(Back to Workflow)

Step 5B.

------------------

(Back to Workflow)

Authors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Nasty_Metagenomes

Previous Hackathon Work

Goals:

Dependencies & Tools

AMR Database

Workflow Diagram

Workflow Steps

Step 1.

------------------

(Back to Workflow)

Step 2.

------------------

(Back to Workflow)

Step 3.

------------------

(Back to Workflow)

Step 4.

------------------

(Back to Workflow)

Step 5A.

------------------

(Back to Workflow)

Step 5B.

------------------

(Back to Workflow)

Authors