Skip to content

Latest commit

 

History

History
executable file
·
83 lines (56 loc) · 5.15 KB

output.md

File metadata and controls

executable file
·
83 lines (56 loc) · 5.15 KB

nf-core/demultiplex: Output

This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  • Reformatting the input sample sheet to collapse iCLIP samples into one lane and separate single cell 10X samples into separate sample sheets
  • Checking the sample sheet for downstream error causing samples such as:
    • a mix of short and long indexes on the same lane
    • a mix of single and dual indexes on the same lane
  • Processes that only run if there are issues within the sample sheet found by the sample sheet check process (CONDITIONAL):
    • Creates a new sample sheet with any samples that would cause an error removed and create a a txt file of a list of the removed problem samples
    • Run bcl2fastq on the newly created sample sheet and output the Stats.json file
    • Parsing the Stats.json file for the indexes that were in the problem samples list.
    • Recheck newly made sample sheet for any errors or problem samples that did not match any indexes in the Stats.json file. If there is still an issue the pipeline will exit at this stage.
  • bcl2fastq - converting bcl files to fastq, and demultiplexing (CONDITIONAL)
  • Processes that only run if there are 10X samples on the sample sheet input (CONDITIONAL):
    • CellRanger - demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files and is a wrapper around Illumina's bcl2fastq
    • CellRangerCount - performs alignment, filtering, barcode counting, and UMI counting
  • FastQC - read quality control
  • MultiQC - aggregate report, describing results of the whole pipeline

bcl2fastq

bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.

Output directory: params.outdir/projectName/FastQ

  • samplename/sample.fastq.gz
    • Untrimmed raw fastq files

CellRanger

CellRanger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis. cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files and is a wrapper around Illumina's bcl2fastq. cellranger count takes FASTQ files from cellranger mkfastq and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis.

cellranger mkfastq

Output directory: params.outdir/mkfastq/outs/fastq_path/project_name/sample_name/

  • samplename/sample.fastq.gz
    • Untrimmed raw fastq files

cellranger count

Output directory: params.outdir/count/projectName/sampleName/outs/

  • outs/analysis
    • Secondary analysis data including dimensionality reduction, cell clustering, and differential expression
  • outs/web_summary.html
    • Run summary metrics and charts in HTML format
  • outs/metrics_summary.csv
    • Run summary metrics in CSV format

FastQC

FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences.

For further reading and documentation see the FastQC help.

Output directory: params.outdir/projectName/fastqc

  • sample_fastqc.html
    • FastQC report, containing quality metrics for your untrimmed raw fastq files
  • sampleID
    • Directory containing the FastQC report, tab-delimited data file and plot images

MultiQC

MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory.

The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability.

Output directory: params.outdir/multiqc

  • Project_multiqc_report.html
    • MultiQC report - a standalone HTML file that can be viewed in your web browser
  • Project_multiqc_data/
    • Directory containing parsed statistics from the different tools used in the pipeline

For more information about how to use MultiQC reports, see http://multiqc.info