A quick start guide to doing RNA-sequencing analysis in Galaxy. Covers Importing data through gene expression analysis.
- Walkthrough tutorial style
- Brief: <3 hrs
- FH Galaxy server login information
- Importing data to Galaxy
- Combining datasets in Galaxy
- Using UCSC to get a gene annotation
- Read mapping with TopHat
- Counting reads with htseq-count
- Differential gene expression analysis with DESeq2
- Galaxy
- TopHat
- htseq-count
- DESeq2
Lecture materials from the UW Tools For Computational Biology course. Covers Bioconductor packages for working with genomic data, inspecting and quering genomica data, identifying and annotating genomic varients.
- R Markdown course materials
- Bioconductor tools to extract meaning from previously mapped files
- Genomic data analysis
- Using GenomicRanges to store and query genomic data
- Finding the overlap between two genomic sequences
- Sequence data analysis
- Loading and querying BAM files using Rsamtools
- Computing pile up statistics
- Read Variant Call Format (VCF) Files
- Read and extract contents of VCF
- Reading varients from VCF
- R (Bioconductor)
- Rsamtools
- VariantAnnotation
- GenomicRanges
A series of shell and R scripts used to process RNA sequencing data
- GitHub repo with scripts and README
- Minimal guidance
- Downloading raw fastq files from the NCBI sequence read archive (http://www.ncbi.nlm.nih.gov/sra) or generating your own sequencing files.
- Alignment to a reference genome. Unaligned reads may then be aligned to alternative genomes such a pathogen genome.
- Merging (for multilane samples) and processing
- Run the resulting bam files can be run through a series of additional analyses such as GATK variant detection and STAR fusion gene detection.
- Quality control analyses may also be performed on fastq files using FastQC and bam files using RNAseQC.
Amy P’s repository with code and documentation for Pathways/SHIP, for materials translatable to high school students
- workflow used with undergrad interns to analyze RNAseq data for variety of labs
- STAR two pass alignment
- RNASeQC
- post-processing in R for DGE
- import metadata and data
- assess gene data
- annotate gene names
- create counts matrix, phenotype matrix, SummarizedExperiment object
- DGE
- STAR, RNASeQC
- Tidyverse and DESeq2
A single module in a series from The Alex's Lemonade Stand Foundation Childhood Cancer Data Lab
- According to the schedule modules take two days
- Lots of good information in this organization's repos related to R and RNA seq but it's not well documented where things live + broken links make the repos difficult to navigate.
- Installing and setting up a Docker container
- Accessing data on flash drives
- Intro to R and intermediate R (Tidyverse)
- QC, trim, and quantification using Salmon
- Gene level summary using tximport
- RNA-seq EDA
- Differential gene expression analysis
- Normalizing count matrix
- Single cell - processing 10x raw data
- Single cell - dimensionality reduction
- Machine learning - data prep, cclustering, PLIER
- Bulk RNA Seq
- FastQC
- fastp
- Salmon
- tximport
- DESeq2
RNA Seq analysis workshop course materials.
- According to website the workshop took 4 days
- Includes slides, sample agignment files/read counts/outputs, extensive course notes
- Set up on the command line - create directory structure, download fastq
- QC raw reads w FastQC
- Alignment with STAR
- Interacting with BAM/SAM files using samtools
- Visual inspection with IGV
- Read in feature counts to R
- Use DESeq2 to normalize read counts for differences in seq depth and transform reads to the log2 scale.
- Differential gene analysis with DESeq2
- GO term enrichment
- DESeq2
- ClusterProfiler
An in depth course covering all aspects of RNA-seq analysis.
- A course made up of multiple modules
- According to the schedule takes 5 days
- Course set up (aws, unix, tool installation)
- Intro to RNA seq theory
- General goals/themes in RNA seq analysis workflow
- Intro to BAM/SAM formats
- Visualizatio of alignment in IGV
- BAM read counting
- Expression estimation for known genes and transcripts
- Differential Expression analysis
- Downstream interpretation of expression
- Alignment free estimation of expression with Kallisto/Sleuth
- Isoform discovery w StringTie
- Differential splicing analysis with Ballgown
- Examine and visualize junction counts
- DeNovo assembly with Trinity
- Transcript annotation with Trinotate
- ScRNAseq applications/advantages/challenges
- 10x/CellRanger overview
- Custom scRNAseq analysis in R
- EC2, unix commands for cloud computing (more info here)
- SAMtools, bam-readcount, HISAT2, StringTie, gffcompare, htseq-count, TopHat,kallisto, FastQC, MultiQC, Picard, Flexbar, Regtools, RSeQC, bedops, gtfToGenePred, genePredToBed, how_are_we_stranded_here, R, tidyverse, Bioconductor, Sleuth (more info here)
They have a series of RNAseq classes offered, using various approaches and infrastructure. The synopsis here includes:
- https://github.com/hbctraining/rnaseq_overview
- https://hbctraining.github.io/Intro-to-rnaseq-hpc-salmon-flipped/ (most recent)
- overview is conceptual, ~5 hours
- HPC is skills-based, 7.5 hours of instructor-led with substantial prep for participants, including homework to submit
Overview:
- library prep
- sequencing steps and sequences
- experimental planning considerations
- strategies for bulk-RNAseq analysis
- data management
- raw data QC
- mapping/quantification
- sample-level assessment
- count modeling and hypothesis testing
- visualization of results
- functional analysis
HPC:
- working in HPC
- Project organization and data management
- quality control of data
- sequence alignment
- alignment-free methods
- troubleshooting RNAseq data analysis
- automating the RNAseq workflow
Other materials:
- Intro to R: https://hbctraining.github.io/Intro-to-R-flipped/#lessons
- Intro to DGE: https://hbctraining.github.io/DGE_workshop_salmon_online/#lessons
Overview: none
HPC:
- FileZilla, text editor, gitbash
- uses on-premise compute
- fastqc, slurm, salmon, MultiQC, bash, R, DGE
Nextflow pipeline
This was copy and pasted from outline:
- Download FastQ files via SRA, ENA or GEO ids and auto-create input samplesheet (ENA FTP; if required)
- Merge re-sequenced FastQ files (cat)
- Read QC (FastQC)
- UMI extraction (UMI-tools)
- Adapter and quality trimming (Trim Galore!)
- Removal of ribosomal RNA (SortMeRNA)
- Choice of multiple alignment and quantification routes:
- STAR -> Salmon
- STAR -> RSEM
- HiSAT2 -> NO QUANTIFICATION
- Sort and index alignments (SAMtools)
- UMI-based deduplication (UMI-tools)
- Duplicate read marking (picard MarkDuplicates)
- Transcript assembly and quantification (StringTie)
- Create bigWig coverage files (BEDTools, bedGraphToBigWig)
- Extensive quality control:
- RSeQC
- Qualimap
- dupRadar
- Preseq
- DESeq2
- Pseudo-alignment and quantification (Salmon; optional)
- Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)
DGE with Bioconductor
- 4 Data packaging
- 4.1 Reading in count-data -4.2 Organising sample information -4.3 Organising gene annotations
- 5 Data pre-processing -5.1 Transformations from the raw-scale -5.2 Removing genes that are lowly expressed -5.3 Normalising gene expression distributions -5.4 Unsupervised clustering of samples
- 6 Differential expression analysis -6.1 Creating a design matrix and contrasts -6.2 Removing heteroscedascity from count data -6.3 Fitting linear models for comparisons of interest -6.4 Examining the number of DE genes -6.5 Examining individual DE genes from top to bottom -6.6 Useful graphical representations of differential expression results
- 7 Gene set testing with camera