Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update viash version #13

Merged
merged 29 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
e0bfa5b
update submodule
KaiWaldrant Aug 16, 2024
9b40161
update viash version
KaiWaldrant Aug 16, 2024
d6a4655
relocate thumbnail
KaiWaldrant Aug 16, 2024
edb908d
update methods metadata
KaiWaldrant Aug 16, 2024
dc1aaf8
update control_methods
KaiWaldrant Aug 16, 2024
3c23b8d
add file_type
KaiWaldrant Aug 16, 2024
d5e398f
remove obs layer
KaiWaldrant Aug 16, 2024
1f40856
Update common resources
KaiWaldrant Sep 2, 2024
628f153
Update file* API
KaiWaldrant Sep 2, 2024
6a71bdc
update README
KaiWaldrant Sep 2, 2024
81ade3d
fix component test fp
KaiWaldrant Sep 2, 2024
fd3d218
update metrics references api
KaiWaldrant Sep 2, 2024
1be13e3
add openproblems package to DCA
KaiWaldrant Sep 2, 2024
ac5f8f4
Merge remote-tracking branch 'origin/main' into feature/no-ref/update…
KaiWaldrant Sep 18, 2024
ec6c32a
update DCA method
KaiWaldrant Sep 18, 2024
693f4bf
Merge remote-tracking branch 'origin/main' into feature/no-ref/update…
KaiWaldrant Sep 18, 2024
72dd68e
update submodule
KaiWaldrant Sep 18, 2024
cf43017
update dca
KaiWaldrant Sep 18, 2024
781295f
update links
KaiWaldrant Sep 18, 2024
c0cbc94
set numpy<2
KaiWaldrant Sep 19, 2024
0a01503
update fapi file name
KaiWaldrant Sep 19, 2024
ba3c7eb
update create_readme script
KaiWaldrant Sep 19, 2024
888d5b5
update readme
KaiWaldrant Sep 19, 2024
4ab3149
relocate process datasets
KaiWaldrant Sep 19, 2024
dbec1ef
Update scripts dir
KaiWaldrant Sep 19, 2024
eaff41e
update changelog
KaiWaldrant Sep 19, 2024
2713a3a
update readme
KaiWaldrant Sep 19, 2024
e06814e
update process_datasets merge path
KaiWaldrant Sep 19, 2024
cc9c324
fix processor config
KaiWaldrant Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 45 additions & 124 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,75 +8,8 @@ Do not edit this file directly.

Removing noise in sparse single-cell RNA-sequencing count data

Path to source:
[`src`](https://github.com/openproblems-bio/task_denoising/src)

## README

## Installation

You need to have Docker, Java, and Viash installed. Follow [these
instructions](https://openproblems.bio/documentation/fundamentals/requirements)
to install the required dependencies.

## Add a method

To add a method to the repository, follow the instructions in the
`scripts/add_a_method.sh` script.

## Frequently used commands

To get started, you can run the following commands:

``` bash
git clone [email protected]:openproblems-bio/task_denoising.git

cd task_denoising

# initialise submodule
scripts/init_submodule.sh

# download resources
scripts/download_resources.sh
```

To run the benchmark, you first need to build the components.
Afterwards, you can run the benchmark:

``` bash
viash ns build --parallel --setup cachedbuild

scripts/run_benchmark.sh
```

After adding a component, it is recommended to run the tests to ensure
that the component is working correctly:

``` bash
viash ns test --parallel
```

Optionally, you can provide the `--query` argument to test only a subset
of components:

``` bash
viash ns test --parallel --query 'component_name'
```

## Motivation

Single-cell RNA-Seq protocols only detect a fraction of the mRNA
molecules present in each cell. As a result, the measurements (UMI
counts) observed for each gene and each cell are associated with
generally high levels of technical noise ([Grün et al.,
2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes
the task of estimating the true expression level of each gene in each
cell. In the single-cell literature, this task is also referred to as
*imputation*, a term which is typically used for missing data problems
in statistics. Similar to the use of the terms “dropout”, “missing
data”, and “technical zeros”, this terminology can create confusion
about the underlying measurement process ([Sarkar and Stephens,
2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
Repository:
[openproblems-bio/task_denoising](https://github.com/openproblems-bio/task_denoising)

## Description

Expand Down Expand Up @@ -115,23 +48,23 @@ dataset.
flowchart LR
file_common_dataset("Common Dataset")
comp_process_dataset[/"Data processor"/]
file_train_h5ad("Training data")
file_test_h5ad("Test data")
file_train_h5ad("Training data")
comp_control_method[/"Control Method"/]
comp_method[/"Method"/]
comp_metric[/"Metric"/]
comp_method[/"Method"/]
file_prediction("Denoised data")
file_score("Score")
file_common_dataset---comp_process_dataset
comp_process_dataset-->file_train_h5ad
comp_process_dataset-->file_test_h5ad
file_train_h5ad---comp_control_method
file_train_h5ad---comp_method
comp_process_dataset-->file_train_h5ad
file_test_h5ad---comp_control_method
file_test_h5ad---comp_metric
file_train_h5ad---comp_control_method
file_train_h5ad---comp_method
comp_control_method-->file_prediction
comp_method-->file_prediction
comp_metric-->file_score
comp_method-->file_prediction
file_prediction---comp_metric
```

Expand All @@ -151,7 +84,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand All @@ -170,9 +103,6 @@ Slot description:

## Component type: Data processor

Path:
[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset)

A denoising dataset processor.

Arguments:
Expand All @@ -187,72 +117,69 @@ Arguments:

</div>

## File format: Training data
## File format: Test data

The subset of molecules used for the training dataset
The subset of molecules used for the test dataset

Example file: `resources_test/denoising/pancreas/train.h5ad`
Example file: `resources_test/denoising/pancreas/test.h5ad`

Format:

<div class="small">

AnnData object
layers: 'counts'
uns: 'dataset_id'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'

</div>

Slot description:
Data structure:

<div class="small">

| Slot | Type | Description |
|:--------------------|:----------|:-------------------------------------|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| Slot | Type | Description |
|:---|:---|:---|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |

</div>

## File format: Test data
## File format: Training data

The subset of molecules used for the test dataset
The subset of molecules used for the training dataset

Example file: `resources_test/denoising/pancreas/test.h5ad`
Example file: `resources_test/denoising/pancreas/train.h5ad`

Format:

<div class="small">

AnnData object
layers: 'counts'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'
uns: 'dataset_id'

</div>

Slot description:
Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |
| Slot | Type | Description |
|:--------------------|:----------|:-------------------------------------|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |

</div>

## Component type: Control Method

Path:
[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods)

A control method.

Arguments:
Expand All @@ -267,40 +194,34 @@ Arguments:

</div>

## Component type: Method

Path:
[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods)
## Component type: Metric

A method.
A metric.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_train` | `file` | The subset of molecules used for the training dataset. |
| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_prediction` | `file` | A denoised dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |

</div>

## Component type: Metric

Path:
[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics)
## Component type: Method

A metric.
A method.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_prediction` | `file` | A denoised dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |
| `--input_train` | `file` | The subset of molecules used for the training dataset. |
| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |

</div>

Expand All @@ -320,7 +241,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand All @@ -347,7 +268,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand Down
50 changes: 25 additions & 25 deletions _viash.yaml
Original file line number Diff line number Diff line change
@@ -1,32 +1,12 @@
name: task_denoising
version: dev

organization: openproblems-bio
description: |
Removing noise in sparse single-cell RNA-sequencing count data.
version: dev
license: MIT
keywords: [single-cell, openproblems, benchmark, denoising]
links:
issue_tracker: https://github.com/openproblems-bio/task_denoising/issues
repository: https://github.com/openproblems-bio/task_denoising
docker_registry: ghcr.io

info:
label: Denoising
summary: "Removing noise in sparse single-cell RNA-sequencing count data"
image: /src/api/thumbnail.svg
motivation: |
Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present
in each cell. As a result, the measurements (UMI counts) observed for each gene and each
cell are associated with generally high levels of technical noise ([Grün et al.,
2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes the task of
estimating the true expression level of each gene in each cell. In the single-cell
literature, this task is also referred to as *imputation*, a term which is typically
used for missing data problems in statistics. Similar to the use of the terms "dropout",
"missing data", and "technical zeros", this terminology can create confusion about the
underlying measurement process ([Sarkar and Stephens,
2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
description: |
label: Denoising
keywords: [single-cell, openproblems, benchmark, denoising]
summary: "Removing noise in sparse single-cell RNA-sequencing count data"
description: |
A key challenge in evaluating denoising methods is the general lack of a ground truth. A
recent benchmark study ([Hou et al.,
2020](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02132-x))
Expand All @@ -43,13 +23,33 @@ info:
accuracy is measured by comparing the result to the test dataset. The authors show that
both in theory and in practice, the measured denoising accuracy is representative of the
accuracy that would be obtained on a ground truth dataset.
links:
issue_tracker: https://github.com/openproblems-bio/task_denoising/issues
repository: https://github.com/openproblems-bio/task_denoising
docker_registry: ghcr.io

info:
image: thumbnail.svg
motivation: |
Single-cell RNA-Seq protocols only detect a fraction of the mRNA molecules present
in each cell. As a result, the measurements (UMI counts) observed for each gene and each
cell are associated with generally high levels of technical noise ([Grün et al.,
2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes the task of
estimating the true expression level of each gene in each cell. In the single-cell
literature, this task is also referred to as *imputation*, a term which is typically
used for missing data problems in statistics. Similar to the use of the terms "dropout",
"missing data", and "technical zeros", this terminology can create confusion about the
underlying measurement process ([Sarkar and Stephens,
2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).

test_resources:
- type: s3
path: s3://openproblems-data/resources_test/denoising/
dest: resources_test/denoising
- type: s3
path: s3://openproblems-data/resources_test/common/
dest: resources_test/common

authors:
- name: "Wesley Lewis"
roles: [ author, maintainer ]
Expand Down
2 changes: 1 addition & 1 deletion common
2 changes: 1 addition & 1 deletion src/api/comp_control_method.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ test_resources:
- type: python_script
path: /common/component_tests/run_and_check_output.py
- type: python_script
path: /common/component_tests/check_method_config.py
path: /common/component_tests/check_config.py
- path: /common/library.bib
- path: /resources_test/denoising/pancreas
dest: resources_test/denoising/pancreas
2 changes: 1 addition & 1 deletion src/api/comp_method.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ test_resources:
- type: python_script
path: /common/component_tests/run_and_check_output.py
- type: python_script
path: /common/component_tests/check_method_config.py
path: /common/component_tests/check_config.py
- path: /common/library.bib
- path: /resources_test/denoising/pancreas
dest: resources_test/denoising/pancreas
2 changes: 1 addition & 1 deletion src/api/comp_metric.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ arguments:
required: true
test_resources:
- type: python_script
path: /common/component_tests/check_metric_config.py
path: /common/component_tests/check_config.py
- type: python_script
path: /common/component_tests/run_and_check_output.py
- path: /common/library.bib
Expand Down
Loading
Loading