Comprehensive comparison of large-scale tissue expression datasets

Code to reproduce the fold enrichment analyses and figures from the article

Project structure

README.md --> Markdown file
makefile --> main script
generate_files.pl --> script to generate all the necessary files. This script uses the original datasets files (with no filter for unconfident gene-tissue associations)
analyses.R --> script orchestrating the generation of the figures
R/ --> All the scripts necessary to reproduce the figures
- summary_figure_drawing.R --> Initial figure with the tissues and number of genes/proteins provided by each dataset
- datasets_expression_breadth_analyses.R --> Expression breadth distribution analysis
- fold_enrichment_analysis_per_dataset.R --> Generates the fold-enrichment plots for each dataset
- fold_enrichment_score_calibration_analysis.R --> Score calibration figure
- venn_diagram_analyses.R --> Creates all the venn diagrams and calculates the p-values of the overalps when comparing common proteins and tissues
- external_function.R --> External function to switch coordinates in facet grid (Function from stackoverflow)
data/
datasests/ --> Datasets files need to be stored here
dictionary/ --> Contains the files necessary to perform the tissue backtracking bassed on BRENDA Ontology
labels.tsv --> The bto terms corresponding to the 21 tissues of interest: tissues_code tissue_name BTO
bto_entities.tsv --> mapping of bto terms to internal identifiers: internal_code tissues_code BTO
bto_groups .tsv --> the parent-children relationships used to do the backtracking: internal_code parent_internal_code
figures/ --> Folder where all the figures generated are stored

Run the analyses

Download the project

Make sure you have a default CRAN repository set, for example by putting the following into your ~/.Rprofile:

local({r <- getOption("repos")
       r["CRAN"] <- "http://cran.us.r-project.org"
       options(repos=r)})

Execute the makefile script from the command line: > make
All the files will be generated in the data folder
All the figures from the analyses will be created in the figures/ folder

Generated files

./data/

Fold enrichment analyses result files: DATASET_GOLDSTANDARD_fold_enrichment_analysis.tsv (goldstandards: UniProt-KB and mRNA reference set)
Expression breadth analyses result files: CUTOFF_cutoff_expression_breadth.tsv (cutoffs: low, medium, high)
Consistency analyses result files: CUTOFF_consistency_analysis.tsv (cutoffs: low, medium, high)
P-value results for venn diagrams where common proteins and tissues are taken into account: pairwise_pvalue_results.txt

./figures/

Fold enrichment plots: DATASET_fold_enrichment.png
Venn diagrams:
COMPARISON_venn_diagram.png
all_datasets_comparison_venn_diagram_COMMON/nonCOMMON.png
Expression breadth plots: DATASET_datasets_prots_num_tissues.png
Score calibration plot: datasets_score_calibration.png

Requirements

Perl
R
curl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comprehensive comparison of large-scale tissue expression datasets

Code to reproduce the fold enrichment analyses and figures from the article

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
R		R
data		data
figures		figures
.gitignore		.gitignore
README.md		README.md
analyses.R		analyses.R
generate_files.pl		generate_files.pl
makefile		makefile

albsantosdel/TISSUES-database_analyses

Folders and files

Latest commit

History

Repository files navigation

Comprehensive comparison of large-scale tissue expression datasets

Code to reproduce the fold enrichment analyses and figures from the article

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages