snakefiles

This repository has Snakefiles for common RNA-seq data analysis workflows. Please feel free to copy them and modify them to suit your needs.

Getting started

If you are new to Snakemake, you might like to start by walking through my tutorial for beginners. Next, have a look at Johannes Koster's introductory slides, tutorial, documentation, and FAQ.

Data

This repository includes 6 FASTQ files in data/fastq/ to illustrate the usage of each of the RNA-seq workflows.

Sample1
- Sample1.R1.fastq.gz has the first mates of sequenced fragments.
- Sample1.R2.fastq.gz has the second mates of sequenced fragments.
Sample2
- Sample2.L1.R1.fastq.gz
- Sample2.L2.R1.fastq.gz
  - The first mate reads split across two files. Some software such as STAR requires these reads to be merged into one file.
- Sample2.L1.R2.fastq.gz
- Sample2.L2.R2.fastq.gz
  - Likewise, the second mate reads are also split across two files. To make matters more complicated, the Sample2.L2.R2.fastq.gz has only 2,000 reads, whereas Sample2.L2.R1.fastq.gz has 2,500 reads. The Snakefiles in this repository can handle this without any problems.

Scripts

make_samples.py creates the samples.json file.
bsub.py receives job scripts from Snakemake and automatically submits them to an appropriate LSF queue based on job requirements.

RNA-seq workflows

kallisto/

Quantify gene isoform expression in transcripts per million (TPM) with kallisto and collate outputs from multiple samples into one file.

star_express/

Execute a multi-sample 2-pass STAR alignment, sharing the splice junctions across samples. Count fragments per gene and fragments per splice site. Also produce a BAM file with coordinates relative to transcripts. Quantify transcripts in TPM with eXpress. Collate outputs from multiple samples.

Contributing

Please submit an issue to report bugs or ask questions.

Please contribute bug fixes or new features with a pull request to this repository. Note that your contribution to this repository will be dedicated to the public domain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

snakefiles

Getting started

Data

Scripts

RNA-seq workflows

kallisto/

star_express/

Contributing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
kallisto		kallisto
star_express		star_express
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bsub.py		bsub.py
make_samples.py		make_samples.py
samples.json		samples.json

License

michalogit/snakefiles

Folders and files

Latest commit

History

Repository files navigation

snakefiles

Getting started

Data

Scripts

RNA-seq workflows

kallisto/

star_express/

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages