Skip to content

albsantosdel/TISSUES-database_analyses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comprehensive comparison of large-scale tissue expression datasets alt text

Code to reproduce the fold enrichment analyses and figures from the article

Project structure

  • README.md --> Markdown file
  • makefile --> main script
  • generate_files.pl --> script to generate all the necessary files. This script uses the original datasets files (with no filter for unconfident gene-tissue associations)
  • analyses.R --> script orchestrating the generation of the figures
  • R/ --> All the scripts necessary to reproduce the figures
    • summary_figure_drawing.R --> Initial figure with the tissues and number of genes/proteins provided by each dataset
    • datasets_expression_breadth_analyses.R --> Expression breadth distribution analysis
    • fold_enrichment_analysis_per_dataset.R --> Generates the fold-enrichment plots for each dataset
    • fold_enrichment_score_calibration_analysis.R --> Score calibration figure
    • venn_diagram_analyses.R --> Creates all the venn diagrams and calculates the p-values of the overalps when comparing common proteins and tissues
    • external_function.R --> External function to switch coordinates in facet grid (Function from stackoverflow)
  • data/
  • datasests/ --> Datasets files need to be stored here
  • dictionary/ --> Contains the files necessary to perform the tissue backtracking bassed on BRENDA Ontology
  • labels.tsv --> The bto terms corresponding to the 21 tissues of interest: tissues_code tissue_name BTO
  • bto_entities.tsv --> mapping of bto terms to internal identifiers: internal_code tissues_code BTO
  • bto_groups .tsv --> the parent-children relationships used to do the backtracking: internal_code parent_internal_code
  • figures/ --> Folder where all the figures generated are stored

Run the analyses

  1. Download the project

  2. Make sure you have a default CRAN repository set, for example by putting the following into your ~/.Rprofile:

    local({r <- getOption("repos")
           r["CRAN"] <- "http://cran.us.r-project.org"
           options(repos=r)})
  3. Execute the makefile script from the command line: > make

  4. All the files will be generated in the data folder

  5. All the figures from the analyses will be created in the figures/ folder

Generated files

./data/

  • Fold enrichment analyses result files: DATASET_GOLDSTANDARD_fold_enrichment_analysis.tsv (goldstandards: UniProt-KB and mRNA reference set)
  • Expression breadth analyses result files: CUTOFF_cutoff_expression_breadth.tsv (cutoffs: low, medium, high)
  • Consistency analyses result files: CUTOFF_consistency_analysis.tsv (cutoffs: low, medium, high)
  • P-value results for venn diagrams where common proteins and tissues are taken into account: pairwise_pvalue_results.txt

./figures/

  • Fold enrichment plots: DATASET_fold_enrichment.png
  • Venn diagrams:
  • COMPARISON_venn_diagram.png
  • all_datasets_comparison_venn_diagram_COMMON/nonCOMMON.png
  • Expression breadth plots: DATASET_datasets_prots_num_tissues.png
  • Score calibration plot: datasets_score_calibration.png

Requirements

  • Perl
  • R
  • curl

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published