generated from openproblems-bio/task_template
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 7aea717
Showing
155 changed files
with
81,852 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
resources | ||
resources_test | ||
work | ||
.nextflow* | ||
.vscode | ||
.DS_Store | ||
output | ||
trace-* | ||
.ipynb_checkpoints |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "common"] | ||
path = common | ||
url = [email protected]:openproblems-bio/common-resources.git |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# denoising 0.1.0 | ||
|
||
## BREAKING CHANGES | ||
|
||
* Update to viash 0.9.0 RC6 | ||
|
||
* Directory structure has been updated. | ||
|
||
* Update to viash 0.9.0 (PR #13). | ||
|
||
## NEW FUNCTIONALITY | ||
|
||
* Add `CHANGELOG.md` (PR #7). | ||
|
||
* Update `process_dataset` component to subsample large datasets (PR #14). | ||
|
||
* Add the scPRINT method (PR #25) | ||
|
||
## MAJOR CHANGES | ||
|
||
* Revamp `scripts` directory (PR #13). | ||
|
||
* Relocated `process_datasets` to `data_processors/process_datasets` (PR #13). | ||
|
||
## MINOR CHANGES | ||
|
||
* Remove dtype parameter in `.Anndata()` (PR #6). | ||
|
||
* Fix target_sum deprecation warning in `mse` mmetric (PR #8). | ||
|
||
* Update `task_name` variable to denoising in component scripts (PR #9). | ||
|
||
* Update docker containers used in components (PR #12). | ||
|
||
* Set `numpy<2` for some failing methods (PR #13). | ||
|
||
* Small changes to api file names (PR #13). | ||
|
||
* Update test_resources path in components (PR #18). | ||
|
||
* Update workflows to use core repository dependency (PR #20). | ||
|
||
* Update the `common` submodule (PR #24) | ||
|
||
* Use the common `checkItemAllowed()` for the method check in the benchmark workflow (PR #24) | ||
|
||
* Use the `cxg_immune_cell_atlas` dataset instead of the `cxg_mouse_pancreas_atlas` for testing (PR #24) | ||
|
||
* Update `README` (PR #24) | ||
|
||
* Add a base method API schema (PR #24) | ||
|
||
* Add `dataset_organism` to training input files (PR #24) | ||
|
||
## BUG FIXES | ||
|
||
* Update the nextflow workflow dependencies (PR #17). | ||
|
||
* Fix paths in scripts (PR #18). | ||
|
||
* Subsample datasets by batch if batch is defined (PR #22). | ||
|
||
## transfer from openproblems-v2 repository | ||
|
||
### NEW FUNCTIONALITY | ||
|
||
* `api/file_*`: Created a file format specifications for the h5ad files throughout the pipeline. | ||
|
||
* `api/comp_*`: Created an api definition for the split, method and metric components. | ||
|
||
* `process_dataset`: Added a component for processing common datasets into task-ready dataset objects. | ||
|
||
* `resources_test/denoising/pancreas` with `src/tasks/denoising/resources_test_scripts/pancreas.sh`. | ||
|
||
* `workflows/run`: Added nf-tower test script. (PR #205) | ||
|
||
### V1 MIGRATION | ||
|
||
* `control_methods/no_denoising`: Migrated from v1. Extracted from baseline method | ||
|
||
* `control_methods/perfect_denoising`: Migrated from v1.Extracted from baseline method | ||
|
||
* `methods/alra`: Migrated from v1. Changed from python to R and uses lg_cpm normalised data instead of L1 sqrt | ||
|
||
* `methods/dca`: Migrated and adapted from v1. | ||
|
||
* `methods/knn_smoothing`: Migrated and adapted from v1. | ||
|
||
* `methods/magic`: Migrated from v1. | ||
|
||
* `metrics/mse`: Migrated from v1. | ||
|
||
* `metrics/poisson`: Migrated from v1. | ||
|
||
### Changes from V1 | ||
|
||
* Anndata layers are used to store data instead of obsm | ||
|
||
* extended the use of sparse data in methods unless it was not possible | ||
|
||
* process_dataset also removes unnecessary data from train and test datasets not needed by the methods and metrics. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Instructions | ||
|
||
This is a guide on what to do after you have created a new task repository from the template. More in depth information about how to create a new task can be found in the [OpenProblems Documentation](https://openproblems.bio/documentation/create_task/). | ||
|
||
## First things first | ||
|
||
* Update the `_viash.yaml` file with the correct task information. | ||
* Update the `src/api/task_info.yaml` file with the information you have provied in the task issue. | ||
|
||
## Resources | ||
|
||
THe OpenProblems team has provided some test resources that can be used to test the task. These resources are stored in the `resources` folder. The `scripts/download_resources.sh` script can be used to download these resources. | ||
|
||
If these resources are not sufficient, you can add more resources to the `resources` folder. The `scripts/download_resources.sh` script can be updated to download these resources. | ||
|
||
|
||
|
||
|
||
|
||
<!-- Add to readme | ||
* update _viash.yaml | ||
* update src/api/task_info.yaml | ||
* update scripts/download_resources | ||
--> | ||
|
||
#!/bin/bash | ||
|
||
echo "This script is not supposed to be run directly." | ||
echo "Please run the script step-by-step." | ||
exit 1 | ||
|
||
# sync resources | ||
scripts/download_resources.sh | ||
|
||
# create a new component | ||
method_id="my_metric" | ||
method_lang="python" # change this to "r" if need be | ||
|
||
common/create_component/create_component -- \ | ||
--language "$method_lang" \ | ||
--name "$method_id" | ||
|
||
# TODO: fill in required fields in src/task/methods/foo/config.vsh.yaml | ||
# TODO: edit src/task/methods/foo/script.py/R | ||
|
||
# test the component | ||
viash test src/task/methods/$method_id/config.vsh.yaml | ||
|
||
# rebuild the container (only if you change something to the docker platform) | ||
# You can reduce the memory and cpu allotted to jobs in _viash.yaml by modifying .platforms[.type == "nextflow"].config.labels | ||
viash run src/task/methods/$method_id/config.vsh.yaml -- \ | ||
---setup cachedbuild ---verbose | ||
|
||
# run the method (using parquet as input) | ||
viash run src/task/methods/$method_id/config.vsh.yaml -- \ | ||
--de_train "resources/neurips-2023-kaggle/de_train.parquet" \ | ||
--id_map "resources/neurips-2023-kaggle/id_map.csv" \ | ||
--output "output/prediction.parquet" | ||
|
||
# run the method (using h5ad as input) | ||
viash run src/task/methods/$method_id/config.vsh.yaml -- \ | ||
--de_train_h5ad "resources/neurips-2023-kaggle/2023-09-12_de_by_cell_type_train.h5ad" \ | ||
--id_map "resources/neurips-2023-kaggle/id_map.csv" \ | ||
--output "output/prediction.parquet" | ||
|
||
# run evaluation metric | ||
viash run src/task/metrics/mean_rowwise_error/config.vsh.yaml -- \ | ||
--de_test "resources/neurips-2023-kaggle/de_test.parquet" \ | ||
--prediction "output/prediction.parquet" \ | ||
--output "output/score.h5ad" | ||
|
||
# print score on kaggle test dataset | ||
python -c 'import anndata; print(anndata.read_h5ad("output/score.h5ad").uns)' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2024 Open Problems in Single-Cell Analysis | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
Oops, something went wrong.