Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update viash version #13

Merged
merged 29 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
e0bfa5b
update submodule
KaiWaldrant Aug 16, 2024
9b40161
update viash version
KaiWaldrant Aug 16, 2024
d6a4655
relocate thumbnail
KaiWaldrant Aug 16, 2024
edb908d
update methods metadata
KaiWaldrant Aug 16, 2024
dc1aaf8
update control_methods
KaiWaldrant Aug 16, 2024
3c23b8d
add file_type
KaiWaldrant Aug 16, 2024
d5e398f
remove obs layer
KaiWaldrant Aug 16, 2024
1f40856
Update common resources
KaiWaldrant Sep 2, 2024
628f153
Update file* API
KaiWaldrant Sep 2, 2024
6a71bdc
update README
KaiWaldrant Sep 2, 2024
81ade3d
fix component test fp
KaiWaldrant Sep 2, 2024
fd3d218
update metrics references api
KaiWaldrant Sep 2, 2024
1be13e3
add openproblems package to DCA
KaiWaldrant Sep 2, 2024
ac5f8f4
Merge remote-tracking branch 'origin/main' into feature/no-ref/update…
KaiWaldrant Sep 18, 2024
ec6c32a
update DCA method
KaiWaldrant Sep 18, 2024
693f4bf
Merge remote-tracking branch 'origin/main' into feature/no-ref/update…
KaiWaldrant Sep 18, 2024
72dd68e
update submodule
KaiWaldrant Sep 18, 2024
cf43017
update dca
KaiWaldrant Sep 18, 2024
781295f
update links
KaiWaldrant Sep 18, 2024
c0cbc94
set numpy<2
KaiWaldrant Sep 19, 2024
0a01503
update fapi file name
KaiWaldrant Sep 19, 2024
ba3c7eb
update create_readme script
KaiWaldrant Sep 19, 2024
888d5b5
update readme
KaiWaldrant Sep 19, 2024
4ab3149
relocate process datasets
KaiWaldrant Sep 19, 2024
dbec1ef
Update scripts dir
KaiWaldrant Sep 19, 2024
eaff41e
update changelog
KaiWaldrant Sep 19, 2024
2713a3a
update readme
KaiWaldrant Sep 19, 2024
e06814e
update process_datasets merge path
KaiWaldrant Sep 19, 2024
cc9c324
fix processor config
KaiWaldrant Sep 19, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,18 @@

* Directory structure has been updated.

* Update to viash 0.9.0 (PR #13).

## NEW FUNCTIONALITY

* Add `CHANGELOG.md` (PR #7).

## MAJOR CHANGES

* Revamp `scripts` directory (PR #13).

* Relocated `process_datasets` to `data_processors/process_datasets` (PR #13).

## MINOR CHANGES

* Remove dtype parameter in `.Anndata()` (PR #6).
Expand All @@ -20,6 +28,11 @@

* Update docker containers used in components (PR #12).

* Set `numpy<2` for some failing methods (PR #13).

* Small changes to api file names (PR #13).


## transfer from openproblems-v2 repository

### NEW FUNCTIONALITY
Expand Down
181 changes: 51 additions & 130 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,75 +8,8 @@ Do not edit this file directly.

Removing noise in sparse single-cell RNA-sequencing count data

Path to source:
[`src`](https://github.com/openproblems-bio/task_denoising/src)

## README

## Installation

You need to have Docker, Java, and Viash installed. Follow [these
instructions](https://openproblems.bio/documentation/fundamentals/requirements)
to install the required dependencies.

## Add a method

To add a method to the repository, follow the instructions in the
`scripts/add_a_method.sh` script.

## Frequently used commands

To get started, you can run the following commands:

``` bash
git clone [email protected]:openproblems-bio/task_denoising.git

cd task_denoising

# initialise submodule
scripts/init_submodule.sh

# download resources
scripts/download_resources.sh
```

To run the benchmark, you first need to build the components.
Afterwards, you can run the benchmark:

``` bash
viash ns build --parallel --setup cachedbuild

scripts/run_benchmark.sh
```

After adding a component, it is recommended to run the tests to ensure
that the component is working correctly:

``` bash
viash ns test --parallel
```

Optionally, you can provide the `--query` argument to test only a subset
of components:

``` bash
viash ns test --parallel --query 'component_name'
```

## Motivation

Single-cell RNA-Seq protocols only detect a fraction of the mRNA
molecules present in each cell. As a result, the measurements (UMI
counts) observed for each gene and each cell are associated with
generally high levels of technical noise ([Grün et al.,
2014](https://www.nature.com/articles/nmeth.2930)). Denoising describes
the task of estimating the true expression level of each gene in each
cell. In the single-cell literature, this task is also referred to as
*imputation*, a term which is typically used for missing data problems
in statistics. Similar to the use of the terms “dropout”, “missing
data”, and “technical zeros”, this terminology can create confusion
about the underlying measurement process ([Sarkar and Stephens,
2020](https://www.biorxiv.org/content/10.1101/2020.04.07.030007v2)).
Repository:
[openproblems-bio/task_denoising](https://github.com/openproblems-bio/task_denoising)

## Description

Expand Down Expand Up @@ -114,24 +47,24 @@ dataset.
``` mermaid
flowchart LR
file_common_dataset("Common Dataset")
comp_process_dataset[/"Data processor"/]
file_train_h5ad("Training data")
file_test_h5ad("Test data")
comp_data_processor[/"Data processor"/]
file_test("Test data")
file_train("Training data")
comp_control_method[/"Control Method"/]
comp_method[/"Method"/]
comp_metric[/"Metric"/]
comp_method[/"Method"/]
file_prediction("Denoised data")
file_score("Score")
file_common_dataset---comp_process_dataset
comp_process_dataset-->file_train_h5ad
comp_process_dataset-->file_test_h5ad
file_train_h5ad---comp_control_method
file_train_h5ad---comp_method
file_test_h5ad---comp_control_method
file_test_h5ad---comp_metric
file_common_dataset---comp_data_processor
comp_data_processor-->file_test
comp_data_processor-->file_train
file_test---comp_control_method
file_test---comp_metric
file_train---comp_control_method
file_train---comp_method
comp_control_method-->file_prediction
comp_method-->file_prediction
comp_metric-->file_score
comp_method-->file_prediction
file_prediction---comp_metric
```

Expand All @@ -151,7 +84,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand All @@ -170,9 +103,6 @@ Slot description:

## Component type: Data processor

Path:
[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset)

A denoising dataset processor.

Arguments:
Expand All @@ -187,72 +117,69 @@ Arguments:

</div>

## File format: Training data
## File format: Test data

The subset of molecules used for the training dataset
The subset of molecules used for the test dataset

Example file: `resources_test/denoising/pancreas/train.h5ad`
Example file: `resources_test/denoising/pancreas/test.h5ad`

Format:

<div class="small">

AnnData object
layers: 'counts'
uns: 'dataset_id'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'

</div>

Slot description:
Data structure:

<div class="small">

| Slot | Type | Description |
|:--------------------|:----------|:-------------------------------------|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| Slot | Type | Description |
|:---|:---|:---|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |

</div>

## File format: Test data
## File format: Training data

The subset of molecules used for the test dataset
The subset of molecules used for the training dataset

Example file: `resources_test/denoising/pancreas/test.h5ad`
Example file: `resources_test/denoising/pancreas/train.h5ad`

Format:

<div class="small">

AnnData object
layers: 'counts'
uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'train_sum'
uns: 'dataset_id'

</div>

Slot description:
Data structure:

<div class="small">

| Slot | Type | Description |
|:---|:---|:---|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
| `uns["dataset_name"]` | `string` | Nicely formatted name. |
| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
| `uns["dataset_description"]` | `string` | Long description of the dataset. |
| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
| `uns["train_sum"]` | `integer` | The total number of counts in the training dataset. |
| Slot | Type | Description |
|:--------------------|:----------|:-------------------------------------|
| `layers["counts"]` | `integer` | Raw counts. |
| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |

</div>

## Component type: Control Method

Path:
[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods)

A control method.

Arguments:
Expand All @@ -267,40 +194,34 @@ Arguments:

</div>

## Component type: Method

Path:
[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods)
## Component type: Metric

A method.
A metric.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_train` | `file` | The subset of molecules used for the training dataset. |
| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_prediction` | `file` | A denoised dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |

</div>

## Component type: Metric

Path:
[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics)
## Component type: Method

A metric.
A method.

Arguments:

<div class="small">

| Name | Type | Description |
|:---|:---|:---|
| `--input_test` | `file` | The subset of molecules used for the test dataset. |
| `--input_prediction` | `file` | A denoised dataset as output by a method. |
| `--output` | `file` | (*Output*) File indicating the score of a metric. |
| `--input_train` | `file` | The subset of molecules used for the training dataset. |
| `--output` | `file` | (*Output*) A denoised dataset as output by a method. |

</div>

Expand All @@ -320,7 +241,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand All @@ -347,7 +268,7 @@ Format:

</div>

Slot description:
Data structure:

<div class="small">

Expand Down
Loading