Author: Anna Saukkonen
See our paper Highly accurate quantification of allelic gene expression for population and disease genetics for additional information
Allele-specific expression (ASE) is the imbalanced expression of the two alleles of a gene. While many genes are expressed equally from both alleles, gene regulatory differences driven by genetic changes (i.e. regulatory variants) frequently cause the two alleles to be expressed at different levels, resulting in allele-specific expression patterns. The detection of ASE events relies on accurate alignment of RNA-sequencing reads, where challenges still remain. This pipeline has been created to adjust for computational biases associated with allelic counts. It comprises of the following steps:
- Local phasing of genetic data using PHASER
- Creation of parental genomes to align sequencing data to
- Re-allocation of multimapping reads using RSEM
- Selection of the best mapping for each read across the two parental genomes
- Outputs haplotype and site level allelic counts
curl -fsSL get.nextflow.io | bash
Make sure you have Java v8+:
java -version
2. Install either Docker or Singularity if cluster doesn't have them yet
- You can either run with this:
path_to/nextflow run https://github.com/anna-saukkonen/PAC -r main --genome_version GRCh37/38 --reads "path_to_reads_{1,2}.fq.gz" --variants "path_to_variants" --id ID -profile docker/singularity
-r command specifies the branch
- Or download repository and run with this:
path_to/nextflow run PAC/main.nf --genome_version GRCh37/38 --reads "path_to_reads_{1,2}.fq.gz" --variants "path_to_variants" --id ID -profile docker/singularity
reads have to be saved in the same directory in the format: path_to_read_1.fq.gz and path_to_read_2.fq.gz
vcf file needs to be phased
this needs to be same as in the VCF file
-N: name@email_address.com (To receive email when the pipeline is finished)
(default: "/pac_results")
(default:10 We recommend at least 10 for speed)
Depending on the size of file you might need up to 128000MB, min 64000MB
PAC generates 4 output files:
- ID_gene_level_ae.txt
Haplotype level ASE results columns | Description |
---|---|
contig | chromosome |
start | gene start position |
stop | gene end position |
name | gene name |
aCount | haplotype a coverage |
bCount | haplotype b coverage |
totalCount | total coverage |
- results_2genomes_ID.RSEM.STAR.SOFT.NOTRIM_baq.txt
- results_2genomes_ID.RSEM.STAR.SOFT.NOTRIM.txt
- results_1genome_ID.SOFT.NOTRIM_baq.txt
- results_1genome_ID.SOFT.NOTRIM.txt
Single nucleotide level ASE results columns | Description |
---|---|
Chr | chromosome |
Pos | position along chromosome |
RefAl | reference allele |
AltAl | alternative allele |
MapRef | reference allele coverage |
MapAlt | alternative allele coverage |
MapRatio | reference allele ratio |
Mapcov | total coverage at the site |
To test PAC on smaller dataset:
load java
load singularity
git clone https://github.com/anna-saukkonen/PAC.git
path_to_nextflow/nextflow run PAC/main.nf --genome_version GRCh37 --reads "PAC/test/NA12890_merged_sample_0.005_{1,2}.fq.gz" --variants "PAC/test/NA12877_output.phased.downsampled.vcf.gz" --id NA12877 -profile singularity
See this folder for output files you should get
Just use
__ ___ __
||__) /___\\ / `
|| / \\ \\__, ,
man ;)