Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into marco/documentation
Browse files Browse the repository at this point in the history
# Conflicts:
#	README.md
  • Loading branch information
stkmrc committed Sep 8, 2024
2 parents 5e0428f + be3713b commit bdeb7cb
Show file tree
Hide file tree
Showing 67 changed files with 2,000 additions and 2,066 deletions.
368 changes: 368 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
# A dynamic benchmark for gene regulatory network (GRN) inference


<!--
This file is automatically generated from the tasks's api/*.yaml files.
Do not edit this file directly.
-->

Benchmarking GRN inference methods The full documentation is hosted on
[ReadTheDocs](https://openproblems-grn-task.readthedocs.io/en/latest/index.html).
[![Documentation
Status](https://readthedocs.org/projects/grn-inference-benchmarking/badge/?version=latest.png)](https://grn-inference-benchmarking.readthedocs.io/en/latest/?badge=latest)

Path to source:
[`src`](https://github.com/openproblems-bio/task_grn_inference/tree/main/src)

## README

## Installation

You need to have Docker, Java, and Viash installed. Follow [these
instructions](https://openproblems.bio/documentation/fundamentals/requirements)
to install the required dependencies.

## Download resources

``` bash
git clone [email protected]:openproblems-bio/task_grn_inference.git

cd task_grn_inference

# download resources
scripts/download_resources.sh
```

## Infer a GRN

``` bash
viash run src/methods/dummy/config.vsh.yaml -- --multiomics_rna resources/grn-benchmark/multiomics_rna.h5ad --multiomics_atac resources/grn-benchmark/multiomics_atac.h5ad --prediction output/dummy.csv
```

Similarly, run the command for other methods.

## Evaluate a GRN

``` bash
scripts/benchmark_grn.sh --grn resources/grn-benchmark/models/collectri.csv
```

Similarly, run the command for other GRN models.

## Add a method

To add a method to the repository, follow the instructions in the
`scripts/add_a_method.sh` script.

## Motivation

GRNs are essential for understanding cellular identity and behavior.
They are simplified models of gene expression regulated by complex
processes involving multiple layers of control, from transcription to
post-transcriptional modifications, incorporating various regulatory
elements and non-coding RNAs. Gene transcription is controlled by a
regulatory complex that includes transcription factors (TFs),
cis-regulatory elements (CREs) like promoters and enhancers, and
essential co-factors. High-throughput datasets, covering thousands of
genes, facilitate the use of machine learning approaches to decipher
GRNs. The advent of single-cell sequencing technologies, such as
scRNA-seq, has made it possible to infer GRNs from a single experiment
due to the abundance of samples. This allows researchers to infer
condition-specific GRNs, such as for different cell types or diseases,
and study potential regulatory factors associated with these conditions.
Combining chromatin accessibility data with gene expression measurements
has led to the development of enhancer-driven GRN (eGRN) inference
pipelines, which offer significantly improved accuracy over
single-modality methods.

## Description

Here, we present a dynamic benchmark platform for GRN inference. This
platform provides curated datasets for GRN inference and evaluation,
standardized evaluation protocols and metrics, computational
infrastructure, and a dynamically updated leaderboard to track
state-of-the-art methods. It runs novel GRNs in the cloud, offers
competition scores, and stores them for future comparisons, reflecting
new developments over time.

The platform supports the integration of new datasets and protocols.
When a new feature is added, previously evaluated GRNs are re-assessed,
and the leaderboard is updated accordingly. The aim is to evaluate both
the accuracy and completeness of inferred GRNs. It is designed for both
single-modality and multi-omics GRN inference. Ultimately, it is a
community-driven platform. So far, six eGRN inference methods have been
integrated: Scenic+, CellOracle, FigR, scGLUE, GRaNIE, and ANANSE.

Due to its flexible nature, the platform can incorporate various
benchmark datasets and evaluation methods, using either prior knowledge
or feature-based approaches. In the current version, due to the absence
of standardized prior knowledge, we use a feature-based approach to
benchmark GRNs. Our evaluation utilizes standardized datasets for GRN
inference and evaluation, employing multiple regression analysis
approaches to assess both accuracy and comprehensiveness.

## Authors & contributors

| name | roles |
|:------------------|:------------|
| Jalil Nourisa | author |
| Robrecht Cannoodt | author |
| Antoine Passimier | contributor |
| Christian Arnold | contributor |
| Marco Stock | contributor |

## API

``` mermaid
flowchart LR
file_perturbation_h5ad("perturbation")
comp_control_method[/"Control Method"/]
comp_metric[/"Label"/]
file_prediction("GRN")
file_score("Score")
file_multiomics_rna_h5ad("multiomics rna")
comp_method[/"Method"/]
file_multiomics_atac_h5ad("multiomics atac")
comp_method_r[/"Method r"/]
file_perturbation_h5ad---comp_control_method
file_perturbation_h5ad---comp_metric
comp_control_method-->file_prediction
comp_metric-->file_score
file_prediction---comp_metric
file_multiomics_rna_h5ad---comp_method
comp_method-->file_prediction
file_multiomics_atac_h5ad---comp_method
comp_method_r-->file_prediction
```

## File format: perturbation

Perturbation dataset for benchmarking.

Example file: `resources_test/grn-benchmark/perturbation_data.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type', 'sm_name', 'donor_id', 'plate_name', 'row', 'well', 'cell_count'
layers: 'n_counts', 'pearson', 'lognorm'

</div>

Slot description:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
| `obs["sm_name"]` | `string` | The primary name for the (parent) compound (in a standardized representation) as chosen by LINCS. This is provided to map the data in this experiment to the LINCS Connectivity Map data. |
| `obs["donor_id"]` | `string` | Donor id. |
| `obs["plate_name"]` | `string` | Plate name 6 levels. |
| `obs["row"]` | `string` | Row name on the plate. |
| `obs["well"]` | `string` | Well name on the plate. |
| `obs["cell_count"]` | `string` | Number of single cells pseudobulked. |
| `layers["n_counts"]` | `double` | Pseudobulked values using mean approach. |
| `layers["pearson"]` | `double` | (*Optional*) Normalized values using pearson residuals. |
| `layers["lognorm"]` | `double` | (*Optional*) Normalized values using shifted logarithm . |

</div>

## Component type: Control Method

Path:
[`src/control_methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/control_methods)

A control method.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--perturbation_data` | `file` | Perturbation dataset for benchmarking. |
| `--layer` | `string` | (*Optional*) Which layer of pertubation data to use to find tf-gene relationships. Default: `scgen_pearson`. |
| `--prediction` | `file` | (*Output*) GRN prediction. |
| `--tf_all` | `file` | (*Optional*) NA. |

</div>

## Component type: Label

Path:
[`src/metrics`](https://github.com/openproblems-bio/openproblems/tree/main/src/metrics)

A metric to evaluate the performance of the inferred GRN

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--perturbation_data` | `file` | (*Optional*) Perturbation dataset for benchmarking. |
| `--prediction` | `file` | GRN prediction. |
| `--score` | `file` | (*Optional, Output*) File indicating the score of a metric. |
| `--reg_type` | `string` | (*Optional*) name of regretion to use. Default: `ridge`. |
| `--subsample` | `integer` | (*Optional*) number of samples randomly drawn from perturbation data. Default: `-2`. |
| `--max_workers` | `integer` | (*Optional*) NA. Default: `4`. |
| `--method_id` | `string` | (*Optional*) NA. |
| `--tf_all` | `file` | (*Optional*) NA. |
| `--apply_tf` | `boolean` | (*Optional*) NA. Default: `TRUE`. |

</div>

## File format: GRN

GRN prediction

Example file: `resources_test/grn_models/collectri.csv`

Format:

<div class="small">

Tabular data
'source', 'target', 'weight'

</div>

Slot description:

<div class="small">

| Column | Type | Description |
|:---------|:---------|:----------------------|
| `source` | `string` | Source of regulation. |
| `target` | `string` | Target of regulation. |
| `weight` | `float` | Weight of regulation. |

</div>

## File format: Score

File indicating the score of a metric.

Example file: `resources_test/scores/score.h5ad`

Format:

<div class="small">

AnnData object
uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

</div>

Slot description:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["method_id"]` | `string` | A unique identifier for the method. |
| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |

</div>

## File format: multiomics rna

RNA expression for multiomics data.

Example file: `resources_test/grn-benchmark/multiomics_rna.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type', 'donor_id'

</div>

Slot description:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
| `obs["donor_id"]` | `string` | Donor id. |

</div>

## Component type: Method

Path:
[`src/methods`](https://github.com/openproblems-bio/openproblems/tree/main/src/methods)

A GRN inference method

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--multiomics_rna` | `file` | (*Optional*) RNA expression for multiomics data. |
| `--multiomics_atac` | `file` | (*Optional*) Peak data for multiomics data. |
| `--prediction` | `file` | (*Optional, Output*) GRN prediction. |
| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
| `--num_workers` | `integer` | (*Optional*) NA. Default: `4`. |
| `--tf_all` | `file` | (*Optional*) NA. |
| `--max_n_links` | `integer` | (*Optional*) NA. Default: `50000`. |

</div>

## File format: multiomics atac

Peak data for multiomics data.

Example file: `resources_test/grn-benchmark/multiomics_atac.h5ad`

Format:

<div class="small">

AnnData object
obs: 'cell_type', 'donor_id'

</div>

Slot description:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `obs["cell_type"]` | `string` | The annotated cell type of each cell based on RNA expression. |
| `obs["donor_id"]` | `string` | Donor id. |

</div>

## Component type: Method r

Path:
[`src/methods_r`](https://github.com/openproblems-bio/openproblems/tree/main/src/methods_r)

A GRN inference method

Arguments:

<div class="small">

| Name | Type | Description |
|:----------------------|:----------|:-------------------------------------------|
| `--multiomics_rna_r` | `file` | (*Optional*) NA. |
| `--multiomics_atac_r` | `file` | (*Optional*) NA. |
| `--prediction` | `file` | (*Optional, Output*) GRN prediction. |
| `--temp_dir` | `string` | (*Optional*) NA. Default: `output/temdir`. |
| `--num_workers` | `integer` | (*Optional*) NA. Default: `4`. |

</div>

3 changes: 3 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
process.container = 'nextflow/bash:latest'

process.errorStrategy = "ignore"
2 changes: 1 addition & 1 deletion notebooks/process_results.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"metadata": {},
"outputs": [],
"source": [
"base_folder = '../../task_grn_benchmark/resources/results/subsample_200_gb/'"
"base_folder = '../../task_grn_inference/resources/results/subsample_200_gb/'"
]
},
{
Expand Down
Loading

0 comments on commit bdeb7cb

Please sign in to comment.