Skip to content

loipf/RNAseq-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAseq pipeline

a RNAseq quantification pipeline from .fastq files to a gene counts matrix using kallisto for coding and non-coding RNA.


set up pipeline

before running, you have to set up the attached Docker image (will take ~30 min):

docker build -t rnaseq-pipeline https://raw.githubusercontent.com/loipf/RNAseq-pipeline/master/docker/Dockerfile

now either replace the Docker container hash (last output line from previous build command) in nextflow.config or run nextflow with the -with-docker rnaseq-pipeline argument.


run quantification pipeline

no pre-processing or quality improvement is performed and must be done by the user! (check file kallisto_aligned_reads_qc.csv for p_pseudoaligned >70% and DESeq2_size_factor around 0.6-1.4). sums up transcripts to gene_symbols (without haplotype and scaffold genes).

it can be run locally with downloaded github-repo and edited nextflow.config file with:

nextflow run main.nf

or

nextflow run loipf/RNAseq-pipeline -r main --project_dir /path/to/folder --reads_dir /path/to/samples --ensembl_release 101 --num_threads 10 -with-docker rnsaseq-pipeline

for this execution to work properly, you have to be in the current project directory.

nextflow optional extendable with:

-resume
-with-report report_RNAseq-pipeline
-with-timeline timeline_RNAseq-pipeline
-w work_dir

pipeline optional extendable with:

--num_threads 5
--ensembl_release 101
--include_ncrna true   # false
--nextflow_stageInMode symlink  # copy

by default, all output will be saved into the data folder of the current directory. best to run with a new clear folder structure as not all new results do overwrite old ones.

check quality reports in data/quality_reports to exclude problematic samples.

additional, an 3' and 5' adapter sequence (file) needs to be specified with the nextflow arguments --adapter_3_seq_file [sequence|file.fasta] and --adapter_5_seq_file [sequence|file.fasta] or in the main.nf file. otherwise two empty files named NO_FILE and NO_FILE2 must be created to make this work (needs to be fixed someday). if a file is provided, it must be structured like the following example:

> adapter_3_batch_01
AANTGG
> adapter_3_batch_02
GATCGG