Skip to content

bioinformatics-lab/mtbseq-nf

 
 

Repository files navigation

mtbseq-nf

mtbseq-nf pipeline makes MTBseq simple and easy to use via Nextflow workflow manager.

Benefits of the Nextflow wrapper

  • Fine-grained control over resource allocation (CPU/Memory/Storage)
  • Reliance of bioconda and biocontainers for installing packages for reproducibility
  • Ease of use on a range of infrastructure (cloud/on-prem clusters/local machine)
  • Resumability for failed processes
  • Centralized locations for specifying
    • MTBseq parameters (conf/global_parameters.config)
    • Hardware requirements (conf/standard.config)
    • Software requirements (conf/docker.config or conf/conda.config)
  • Dedicated user interface for all parameters for wider audience (nextflow_schema.json). This only works on Nextflow Tower.
  • Easier customizability for the pipeline, using explicit parameters (conf/global_parameters.config).
  • Ability to analyze genomes in parallel as well as in batch, on local, cloud and cluster environments.

Usage

The simplest use case is to analyze a few genomes on a local environment. Almost all aspects are customizable but for the sake of brevity, a bare bones guide for any beginner user is as shown below

  • 1. Clone the project
git clone https://github.com/mtb-bioinformatics/mtbseq-nf
cd mtbseq-nf
  • 2. Download the gatk-3.8.0 tar from here

  • 3. Untar it and place it in the resources folder

tar -xvf GATK_TAR_FILE
  • 4. Move your genomes to the data/full_data folder

They should follow the pattern SAMPLE_R1.fastq.gz

  • 5. To run the pipeline, make sure you have conda installed. Moreover, if you don't already have nextflow installed, you can use the following commands to install it
conda create -n mtbseq-nf-env -c bioconda -c conda-forge nextflow 

You can confirm the setup by activating that environment and using the nextflow info command

conda activate -n mtbseq-nf-env

nextflow info 
  • 6. Then simply issue the following command on the command line
nextflow run main.nf -profile standard,conda

Workflow example

This pipeline has two execution types: batch and parallel and here is a dag example for them!

The execution type is determined by the analysis_mode parameter

Batch

batch-workflow

Parallel

parallel-workflow

Contributions

Contributions are warmly accepted!

License

The insipiration for this project itself MTBseq has a GPL-3 license as of v1.0.3.

The components related to mtbseq-nf project itself (the Nextflow wrapper code) are licensed under the liberal MPL-2.0 license.

We would like to Thank the developers of MTBseq for putting in the intial effort!

About

MTBSeq made simple and easy using Nextflow.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Nextflow 97.8%
  • Shell 1.4%
  • Makefile 0.8%