Skip to content

Commit

Permalink
deploy: 252731b
Browse files Browse the repository at this point in the history
  • Loading branch information
rcannood committed Dec 19, 2024
0 parents commit 7aea717
Show file tree
Hide file tree
Showing 155 changed files with 81,852 additions and 0 deletions.
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
resources
resources_test
work
.nextflow*
.vscode
.DS_Store
output
trace-*
.ipynb_checkpoints
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "common"]
path = common
url = [email protected]:openproblems-bio/common-resources.git
Empty file added .nojekyll
Empty file.
101 changes: 101 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# denoising 0.1.0

## BREAKING CHANGES

* Update to viash 0.9.0 RC6

* Directory structure has been updated.

* Update to viash 0.9.0 (PR #13).

## NEW FUNCTIONALITY

* Add `CHANGELOG.md` (PR #7).

* Update `process_dataset` component to subsample large datasets (PR #14).

* Add the scPRINT method (PR #25)

## MAJOR CHANGES

* Revamp `scripts` directory (PR #13).

* Relocated `process_datasets` to `data_processors/process_datasets` (PR #13).

## MINOR CHANGES

* Remove dtype parameter in `.Anndata()` (PR #6).

* Fix target_sum deprecation warning in `mse` mmetric (PR #8).

* Update `task_name` variable to denoising in component scripts (PR #9).

* Update docker containers used in components (PR #12).

* Set `numpy<2` for some failing methods (PR #13).

* Small changes to api file names (PR #13).

* Update test_resources path in components (PR #18).

* Update workflows to use core repository dependency (PR #20).

* Update the `common` submodule (PR #24)

* Use the common `checkItemAllowed()` for the method check in the benchmark workflow (PR #24)

* Use the `cxg_immune_cell_atlas` dataset instead of the `cxg_mouse_pancreas_atlas` for testing (PR #24)

* Update `README` (PR #24)

* Add a base method API schema (PR #24)

* Add `dataset_organism` to training input files (PR #24)

## BUG FIXES

* Update the nextflow workflow dependencies (PR #17).

* Fix paths in scripts (PR #18).

* Subsample datasets by batch if batch is defined (PR #22).

## transfer from openproblems-v2 repository

### NEW FUNCTIONALITY

* `api/file_*`: Created a file format specifications for the h5ad files throughout the pipeline.

* `api/comp_*`: Created an api definition for the split, method and metric components.

* `process_dataset`: Added a component for processing common datasets into task-ready dataset objects.

* `resources_test/denoising/pancreas` with `src/tasks/denoising/resources_test_scripts/pancreas.sh`.

* `workflows/run`: Added nf-tower test script. (PR #205)

### V1 MIGRATION

* `control_methods/no_denoising`: Migrated from v1. Extracted from baseline method

* `control_methods/perfect_denoising`: Migrated from v1.Extracted from baseline method

* `methods/alra`: Migrated from v1. Changed from python to R and uses lg_cpm normalised data instead of L1 sqrt

* `methods/dca`: Migrated and adapted from v1.

* `methods/knn_smoothing`: Migrated and adapted from v1.

* `methods/magic`: Migrated from v1.

* `metrics/mse`: Migrated from v1.

* `metrics/poisson`: Migrated from v1.

### Changes from V1

* Anndata layers are used to store data instead of obsm

* extended the use of sparse data in methods unless it was not possible

* process_dataset also removes unnecessary data from train and test datasets not needed by the methods and metrics.
73 changes: 73 additions & 0 deletions INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Instructions

This is a guide on what to do after you have created a new task repository from the template. More in depth information about how to create a new task can be found in the [OpenProblems Documentation](https://openproblems.bio/documentation/create_task/).

## First things first

* Update the `_viash.yaml` file with the correct task information.
* Update the `src/api/task_info.yaml` file with the information you have provied in the task issue.

## Resources

THe OpenProblems team has provided some test resources that can be used to test the task. These resources are stored in the `resources` folder. The `scripts/download_resources.sh` script can be used to download these resources.

If these resources are not sufficient, you can add more resources to the `resources` folder. The `scripts/download_resources.sh` script can be updated to download these resources.





<!-- Add to readme
* update _viash.yaml
* update src/api/task_info.yaml
* update scripts/download_resources
-->

#!/bin/bash

echo "This script is not supposed to be run directly."
echo "Please run the script step-by-step."
exit 1

# sync resources
scripts/download_resources.sh

# create a new component
method_id="my_metric"
method_lang="python" # change this to "r" if need be

common/create_component/create_component -- \
--language "$method_lang" \
--name "$method_id"

# TODO: fill in required fields in src/task/methods/foo/config.vsh.yaml
# TODO: edit src/task/methods/foo/script.py/R

# test the component
viash test src/task/methods/$method_id/config.vsh.yaml

# rebuild the container (only if you change something to the docker platform)
# You can reduce the memory and cpu allotted to jobs in _viash.yaml by modifying .platforms[.type == "nextflow"].config.labels
viash run src/task/methods/$method_id/config.vsh.yaml -- \
---setup cachedbuild ---verbose

# run the method (using parquet as input)
viash run src/task/methods/$method_id/config.vsh.yaml -- \
--de_train "resources/neurips-2023-kaggle/de_train.parquet" \
--id_map "resources/neurips-2023-kaggle/id_map.csv" \
--output "output/prediction.parquet"

# run the method (using h5ad as input)
viash run src/task/methods/$method_id/config.vsh.yaml -- \
--de_train_h5ad "resources/neurips-2023-kaggle/2023-09-12_de_by_cell_type_train.h5ad" \
--id_map "resources/neurips-2023-kaggle/id_map.csv" \
--output "output/prediction.parquet"

# run evaluation metric
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \
--de_test "resources/neurips-2023-kaggle/de_test.parquet" \
--prediction "output/prediction.parquet" \
--output "output/score.h5ad"

# print score on kaggle test dataset
python -c 'import anndata; print(anndata.read_h5ad("output/score.h5ad").uns)'
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Open Problems in Single-Cell Analysis

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Loading

0 comments on commit 7aea717

Please sign in to comment.