Skip to content

A multi-label deep learning classifier for predicting bacterial metal resistance genes

License

Notifications You must be signed in to change notification settings

muhit-emon/DeepMRG

Repository files navigation

DeepMRG

DeepMRG is a multi-label deep learning classifier for predicting bacterial metal resistance genes (MRGs). It can be used to predict MRGs from protein sequences provided in fasta file. It also can be applied on metagenomic or isolate assembled contigs (in fasta) to predict MRGs.

This Github is the local version of DeepMRG. The web server of DeepMRG is available via server version.

Preprint: https://doi.org/10.1101/2023.11.14.566903

Requirements

  1. Linux operating system
  2. conda

Installation

git clone https://github.com/muhit-emon/DeepMRG.git
cd DeepMRG
bash install.sh
conda env create -f environment.yml

conda environment activation

After installation of DeepMRG, a conda environment named deepmrg will be created.
To activate the environment, run the following command

conda activate deepmrg

(1) Usage on protein sequences

Go inside DeepMRG directory.

To run DeepMRG on protein sequences (must be in fasta format) to predict MRGs, use the following command

nextflow run protein_pipeline.nf --prot <absolute/path/to/protein/fasta/file> --out_prefix <prefix of output file name>
rm -r work

The command line options for this script (protein_pipeline.nf) are:

--prot: The absolute path of the fasta file containing protein sequences to be classified
--out_prefix: The prefix of the output file name

With --out_prefix demo, An output tsv file named demo_DeepMRG_annotation.tsv (contains MRG predictions by DeepMRG) will be generated inside DeepMRG directory.

Replace absolute/path/to/protein/fasta/file with your protein fasta file absolute path and the output prefix demo with your own output prefix.

(2) Usage on metagenomic or isolate assemblies (DNA sequences of assembled contigs)

Go inside DeepMRG directory.

To run DeepMRG on metagenomic or isolate assembled contigs (must be in fasta format) to predict MRGs, use the following command

nextflow run contig_pipeline.nf --contig <absolute/path/to/contig/fasta/file> --out_prefix <prefix of output file name>
rm -r work

The command line options for this script (contig_pipeline.nf) are:

--contig: The absolute path of the fasta file containing contigs
--out_prefix: The prefix of the output file name

With --out_prefix demo, the following files will be generated inside DeepMRG directory.

  • demo_DeepMRG_annotation.tsv (contains MRG predictions by DeepMRG)
  • demo_predicted_proteins.faa (contains prodigal predicted proteins from contigs)

Replace absolute/path/to/contig/fasta/file with your contig fasta file absolute path and the output prefix demo with your own output prefix.

Pipeline for predicting bacterial MRGs from assembled contigs using DeepMRG is shown below:

Fig6

Output

<prefix of output file name>_DeepMRG_annotation.tsv is the main output file that contains MRG prediction. The output file is a tab separated file with each line containing a protein sequence header and the corresponding MRG predictions. The sequences are in the same order as in the input fasta file.

demo_output

The output file contains 2 columns:

The 1st column (Protein_ID) contains the header of the protein sequences in the input fasta file.

The 2nd column (Prediction) contains the prediction results of DeepMRG. By default, the proteins with the prediction score less than 3.5 (out of 5) are considered as non-MRG.

For example, protein 3 has been predicted by DeepMRG to confer resistances to both Cu and Zn. On the other hand, protein 5 has been predicted as a non-MRG as its prediction score falls below 3.5.

About

A multi-label deep learning classifier for predicting bacterial metal resistance genes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published