-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve Dimensionality Reduction task (#401)
* clean up configs * add pymde * add R-based implementation of diffusionmap * remove comment * change label * fix task description * add more methods and metrics to workflow * update authors * rename folder * fix filename * add workaround script.py * fix spectral features * update task info * add author info * set lmds dims to 2
- Loading branch information
Showing
24 changed files
with
245 additions
and
70 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 16 additions & 29 deletions
45
src/tasks/dimensionality_reduction/methods/diffusion_map/config.vsh.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,44 +1,31 @@ | ||
__merge__: ../../api/comp_method.yaml | ||
functionality: | ||
name: "diffusion_maps" | ||
name: diffusion_map | ||
info: | ||
label: Diffusion maps | ||
summary: "Positive control by Use 1000-dimensional diffusions maps as an embedding." | ||
description: "This serves as a positive control since it uses 1000-dimensional diffusions maps as an embedding" | ||
label: Diffusion Map | ||
summary: Finding meaningful geometric descriptions of datasets using diffusion maps. | ||
description: Implements diffusion map method of data parametrization, including creation and visualization of diffusion map, clustering with diffusion K-means and regression using adaptive regression model. | ||
reference: coifman2006diffusion | ||
documentation_url: https://github.com/openproblems-bio/openproblems | ||
repository_url: https://github.com/openproblems-bio/openproblems | ||
documentation_url: https://bioconductor.org/packages/release/bioc/html/destiny.html | ||
repository_url: https://github.com/theislab/destiny | ||
v1: | ||
path: openproblems/tasks/dimensionality_reduction/methods/diffusion_map.py | ||
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 | ||
preferred_normalization: log_cp10k | ||
variants: | ||
diffusion_map: | ||
resources: | ||
- type: r_script | ||
path: script.R | ||
arguments: | ||
- name: "--n_comps" | ||
type: integer | ||
default: 2 | ||
description: "Number of components to use for the embedding." | ||
- name: t | ||
- name: "--n_dim" | ||
type: integer | ||
default: 1 | ||
description: "Number to power the eigenvalues by." | ||
- name: n_retries | ||
type: integer | ||
default: 1 | ||
description: "Number of times to retry if the embedding fails, each time adding noise." | ||
resources: | ||
- type: python_script | ||
path: script.py | ||
description: Number of dimensions. | ||
default: 3 | ||
platforms: | ||
- type: docker | ||
image: ghcr.io/openproblems-bio/base_python:1.0.2 | ||
image: ghcr.io/openproblems-bio/base_r:1.0.2 | ||
setup: | ||
- type: python | ||
pypi: | ||
- umap-learn | ||
- scipy | ||
- numpy | ||
- type: r | ||
bioc: destiny | ||
- type: nextflow | ||
directives: | ||
label: [ "midtime", highmem, highcpu ] | ||
label: [ midtime, highmem, highcpu ] |
37 changes: 37 additions & 0 deletions
37
src/tasks/dimensionality_reduction/methods/diffusion_map/script.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
requireNamespace("anndata", quietly = TRUE) | ||
requireNamespace("diffusionMap", quietly = TRUE) | ||
|
||
## VIASH START | ||
par <- list( | ||
input = "resources_test/dimensionality_reduction/pancreas/dataset.h5ad", | ||
output = "output.h5ad", | ||
n_dim = 3 | ||
) | ||
## VIASH END | ||
|
||
cat("Reading input files\n") | ||
input <- anndata::read_h5ad(par$input) | ||
|
||
cat("Running destiny diffusion map\n") | ||
# create SummarizedExperiment object | ||
sce <- SingleCellExperiment::SingleCellExperiment( | ||
assays = list( | ||
logcounts = t(as.matrix(input$layers[["normalized"]])) | ||
) | ||
) | ||
dm <- destiny::DiffusionMap(sce) | ||
X_emb <- destiny::eigenvectors(dm)[, seq_len(par$n_dim)] | ||
|
||
cat("Write output AnnData to file\n") | ||
output <- anndata::AnnData( | ||
uns = list( | ||
dataset_id = input$uns[["dataset_id"]], | ||
normalization_id = input$uns[["normalization_id"]], | ||
method_id = meta$functionality_name | ||
), | ||
obsm = list( | ||
X_emb = X_emb | ||
), | ||
shape = input$shape | ||
) | ||
output$write_h5ad(par$output, compression = "gzip") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 41 additions & 0 deletions
41
src/tasks/dimensionality_reduction/methods/pymde/config.vsh.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
__merge__: ../../api/comp_method.yaml | ||
functionality: | ||
name: pymde | ||
info: | ||
label: PyMDE | ||
summary: "A Python implementation of Minimum-Distortion Embedding" | ||
description: | | ||
PyMDE is a Python implementation of Minimum-Distortion Embedding. It is a non-linear | ||
method that preserves distances between cells or neighbourhoods in the original space. | ||
reference: agrawal2021mde | ||
repository_url: https://github.com/cvxgrp/pymde | ||
documentation_url: https://pymde.org | ||
v1: | ||
path: openproblems/tasks/dimensionality_reduction/methods/pymde.py | ||
commit: b3456fd73c04c28516f6df34c57e6e3e8b0dab32 | ||
preferred_normalization: log_cp10k | ||
arguments: | ||
- name: --embed_method | ||
type: string | ||
description: The method to use for embedding. Options are 'umap' and 'tsne'. | ||
default: neighbors | ||
choices: [ neighbors, distances ] | ||
- name: --n_hvg | ||
type: integer | ||
description: Number of highly variable genes to subset to. If not specified, the input matrix will not be subset. | ||
- name: --n_pca_dims | ||
type: integer | ||
description: Number of principal components to use for the initial PCA step. | ||
default: 100 | ||
resources: | ||
- type: python_script | ||
path: script.py | ||
platforms: | ||
- type: docker | ||
image: ghcr.io/openproblems-bio/base_python:1.0.2 | ||
setup: | ||
- type: python | ||
packages: pymde | ||
- type: nextflow | ||
directives: | ||
label: [ midtime, highmem, highcpu ] |
59 changes: 59 additions & 0 deletions
59
src/tasks/dimensionality_reduction/methods/pymde/script.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
import anndata as ad | ||
import scanpy as sc | ||
import pymde | ||
|
||
## VIASH START | ||
par = { | ||
"input": "resources_test/dimensionality_reduction/pancreas/dataset.h5ad", | ||
"output": "reduced.h5ad", | ||
"embed_method": "neighbors", | ||
"n_hvg": 1000, | ||
"n_pca_dims": 50, | ||
} | ||
meta = { | ||
"functionality_name": "foo", | ||
} | ||
## VIASH END | ||
|
||
if par["embed_method"] == "neighbors": | ||
mde_fn = pymde.preserve_neighbors | ||
elif par["embed_method"] == "distances": | ||
mde_fn = pymde.preserve_distances | ||
else: | ||
raise ValueError(f"Unknown embedding method: {par['embed_method']}") | ||
|
||
print("Load input data", flush=True) | ||
input = ad.read_h5ad(par["input"]) | ||
X_mat = input.layers["normalized"] | ||
|
||
if par["n_hvg"]: | ||
print(f"Select top {par['n_hvg']} high variable genes", flush=True) | ||
idx = input.var["hvg_score"].to_numpy().argsort()[::-1][:par["n_hvg"]] | ||
X_mat = X_mat[:, idx] | ||
|
||
print(f"Compute PCA", flush=True) | ||
X_pca = sc.tl.pca(X_mat, n_comps=par["n_pca_dims"], svd_solver="arpack") | ||
|
||
print(f"Run MDE", flush=True) | ||
X_emb = ( | ||
mde_fn(X_pca, embedding_dim=2, verbose=True) | ||
.embed(verbose=True) | ||
.detach() | ||
.numpy() | ||
) | ||
|
||
print("Create output AnnData", flush=True) | ||
output = ad.AnnData( | ||
obs=input.obs[[]], | ||
obsm={ | ||
"X_emb": X_emb | ||
}, | ||
uns={ | ||
"dataset_id": input.uns["dataset_id"], | ||
"normalization_id": input.uns["normalization_id"], | ||
"method_id": meta["functionality_name"] | ||
} | ||
) | ||
|
||
print("Write output to file", flush=True) | ||
output.write_h5ad(par["output"], compression="gzip") |
Oops, something went wrong.