Clinical pipeline for real time analysis of ONT sequencing data.
This pipeline includes species identification (using Centrifuge) and resistance gene identification (using RGI-CARD).
This pipeline assumes the use of a multiplexing sequencing kit as it will gather fasta files based on the barcodes provided :
The following inputs are expected :
- fastq : the folder in which sequencing data were gathered (absolute path, it should end by "/barcodeXX", e.g. XX = 01)
- sampleID : a unique name for your sample
- summary : a basecalling summary file from guppy (absolute path)
an optionnal input can be provided to try to use reads for which demultiplexing failed with --keep_unclassified
- unclassified : a single ONT fastq file containing reads for which demultipexing failed
These information have to be gather in a csv formated file, with the following header (in any order):
A complete run looks like the following :
nextflow run --samples <your_input_file.csv> --outdir <output_folder_name> --summary --tax
Currently, the whole pipeline needs to be run to create a multiQC report (found in the folder multiqc). But you can also search for tool specific outputs in folders : centrifuge, resistance, coverage or checkM.
A few parameters have to be modified to fit your databases, the simplest way is to update the nextflow.config file :
- human_genome (provide an assembly of the human genome)
- pepper_db (location of pepper's database, only needed for the evaluation pipeline)
- homopolish_db (location of the homopolish's mash_sketches)
- cendb (location of the centrifuge database)
- card (location of the CARD database)
To reproduce the polishing evaluation results, you can use the pipeline.
The inputs expected are a little different from the above pipeline :
- fastq : a single ONT fastq file. (absolute path, ´cat´ all passed reads is a way to create such a file)
- sampleID : a unique name for your sample
- summary : a basecalling summary file from guppy (absolute path)
- reference : a reference genome (absolute path)
These information have to be gather in a csv formated file, with the following header (in any order):
A complete run looks like the following :
nextflow run --samples <your_input_file.csv> --outdir <output_folder_name> --summary --map --coverage --tax --reference --res --annotation
Currently, the whole pipeline needs to be run to create a multiQC report (found in the folder multiqc). But you can also search for tool specific outputs in folders : centrifuge, resistance, coverage or checkM.