This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The pipeline is built using Nextflow and processes data using the following steps:
- Reformatting the input sample sheet to collapse iCLIP samples into one lane and separate single cell 10X samples into separate sample sheets
- Checking the sample sheet for downstream error causing samples such as:
- a mix of short and long indexes on the same lane
- a mix of single and dual indexes on the same lane
- Processes that only run if there are issues within the sample sheet found by the sample sheet check process (CONDITIONAL):
- Creates a new sample sheet with any samples that would cause an error removed and create a a txt file of a list of the removed problem samples
- Run
bcl2fastq
on the newly created sample sheet and output the Stats.json file - Parsing the Stats.json file for the indexes that were in the problem samples list.
- Recheck newly made sample sheet for any errors or problem samples that did not match any indexes in the Stats.json file. If there is still an issue the pipeline will exit at this stage.
- bcl2fastq - converting bcl files to fastq, and demultiplexing (CONDITIONAL)
- Processes that only run if there are 10X samples on the sample sheet input (CONDITIONAL):
- CellRanger - demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files and is a wrapper around Illumina's bcl2fastq
- CellRangerCount - performs alignment, filtering, barcode counting, and UMI counting
- FastQC - read quality control
- MultiQC - aggregate report, describing results of the whole pipeline
bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.
Output directory: params.outdir/projectName/FastQ
samplename/sample.fastq.gz
- Untrimmed raw fastq files
CellRanger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis. cellranger mkfastq
demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files and is a wrapper around Illumina's bcl2fastq. cellranger count
takes FASTQ files from cellranger mkfastq
and performs alignment, filtering, barcode counting, and UMI counting. It uses the Chromium cellular barcodes to generate feature-barcode matrices, determine clusters, and perform gene expression analysis.
cellranger mkfastq
Output directory: params.outdir/mkfastq/outs/fastq_path/project_name/sample_name/
samplename/sample.fastq.gz
- Untrimmed raw fastq files
cellranger count
Output directory: params.outdir/count/projectName/sampleName/outs/
outs/analysis
- Secondary analysis data including dimensionality reduction, cell clustering, and differential expression
outs/web_summary.html
- Run summary metrics and charts in HTML format
outs/metrics_summary.csv
- Run summary metrics in CSV format
FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%T/A/G/C). You get information about adapter contamination and other overrepresented sequences.
For further reading and documentation see the FastQC help.
Output directory: params.outdir/projectName/fastqc
sample_fastqc.html
- FastQC report, containing quality metrics for your untrimmed raw fastq files
sampleID
- Directory containing the FastQC report, tab-delimited data file and plot images
MultiQC is a visualisation tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in within the report data directory.
The pipeline has special steps which allow the software versions used to be reported in the MultiQC output for future traceability.
Output directory: params.outdir/multiqc
Project_multiqc_report.html
- MultiQC report - a standalone HTML file that can be viewed in your web browser
Project_multiqc_data/
- Directory containing parsed statistics from the different tools used in the pipeline
For more information about how to use MultiQC reports, see http://multiqc.info