Skip to content

This repository makes available the source code of our SURF paper(s).

License

Notifications You must be signed in to change notification settings

keleslab/surf-paper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for SURF Analysis of ENCODE Data

DOI

This repository makes available the source code for our SURF paper(s). Last updated in April 2020.

The paper presents the Statistical Utility for RBP Functions (SURF) for integrative analysis of RNA-seq and CLIP-seq data. The goal of SURF is to identify alternative splicing (AS), alternative transcription initiation (ATI), and alternative polyadenylation (APA) events regulated by individual RBPs and elucidate protein-RNA interactions governing these events. We apply the SURF pipeline to analyze 104 RBP data sets (from ENCODE). Check out the browsable results from this shiny app!

The current repository includes:

  • application/
    • xena.R: process TCGA and GTEx transcriptome data.
    • encode_surf_one.R: perform SURF analysis for one RBP. This is used for all 104 RBPs.
    • encode_surf_summary.R: summarize the SURF results, including all the statistics and plots reported in the paper.
  • simulation/
    • other_simulation.sh: prepare DEXSeq and run rMATS and MAJIQ.
    • drseq_simulation.R: run DrSeq and DEXSeq, analyze simulation results, including all the statistics and plots reported in the paper.
    • majiq/: contain two files needed for running MAJIQ.
    • dexseq/: contain two files needed for DEXSeq preparation.

To reproduce the ENCODE data analysis/results (available at (DOI): 10.5281/zenodo.3779037):

  1. Download the processed bam files (shRNA-seq and eCLIP-seq) from ENCODE portal.
  2. Download transcriptome quantification of TCGA and GTEx projects from Xena.
  3. Run xena.R, encode_surf_one.R (for each RBP), and encode_surf_summary.R in order.

To reproduce the simulation results:

  1. Download the processed bam files (Homo sapiens) from ArrayExpress dataset E-MTAB-3766.
  2. Run other_simulation.sh and drseq_simulation.R in order.

Contact

Fan Chen ([email protected]) or Sunduz Keles ([email protected])

Reference

Chen F and Keles S. “SURF: integrative analysis of a compendium of RNA-seq and CLIP-seq datasets highlights complex governing of alternative transcriptional regulation by RNA-binding proteins.”

About

This repository makes available the source code of our SURF paper(s).

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 49.2%
  • Perl 38.8%
  • Python 10.8%
  • Shell 1.2%