Project structure
- README.md --> Markdown file
- makefile --> main script
- generate_files.pl --> script to generate all the necessary files. This script uses the original datasets files (with no filter for unconfident gene-tissue associations)
- analyses.R --> script orchestrating the generation of the figures
- R/ --> All the scripts necessary to reproduce the figures
- summary_figure_drawing.R --> Initial figure with the tissues and number of genes/proteins provided by each dataset
- datasets_expression_breadth_analyses.R --> Expression breadth distribution analysis
- fold_enrichment_analysis_per_dataset.R --> Generates the fold-enrichment plots for each dataset
- fold_enrichment_score_calibration_analysis.R --> Score calibration figure
- venn_diagram_analyses.R --> Creates all the venn diagrams and calculates the p-values of the overalps when comparing common proteins and tissues
- external_function.R --> External function to switch coordinates in facet grid (Function from stackoverflow)
- data/
- datasests/ --> Datasets files need to be stored here
- dictionary/ --> Contains the files necessary to perform the tissue backtracking bassed on BRENDA Ontology
- labels.tsv --> The bto terms corresponding to the 21 tissues of interest: tissues_code tissue_name BTO
- bto_entities.tsv --> mapping of bto terms to internal identifiers: internal_code tissues_code BTO
- bto_groups .tsv --> the parent-children relationships used to do the backtracking: internal_code parent_internal_code
- figures/ --> Folder where all the figures generated are stored
Run the analyses
-
Download the project
-
Make sure you have a default CRAN repository set, for example by putting the following into your
~/.Rprofile
:local({r <- getOption("repos") r["CRAN"] <- "http://cran.us.r-project.org" options(repos=r)})
-
Execute the makefile script from the command line:
> make
-
All the files will be generated in the data folder
-
All the figures from the analyses will be created in the figures/ folder
Generated files
./data/
- Fold enrichment analyses result files: DATASET_GOLDSTANDARD_fold_enrichment_analysis.tsv (goldstandards: UniProt-KB and mRNA reference set)
- Expression breadth analyses result files: CUTOFF_cutoff_expression_breadth.tsv (cutoffs: low, medium, high)
- Consistency analyses result files: CUTOFF_consistency_analysis.tsv (cutoffs: low, medium, high)
- P-value results for venn diagrams where common proteins and tissues are taken into account: pairwise_pvalue_results.txt
./figures/
- Fold enrichment plots: DATASET_fold_enrichment.png
- Venn diagrams:
- COMPARISON_venn_diagram.png
- all_datasets_comparison_venn_diagram_COMMON/nonCOMMON.png
- Expression breadth plots: DATASET_datasets_prots_num_tissues.png
- Score calibration plot: datasets_score_calibration.png
Requirements
- Perl
- R
- curl