diff --git a/README.md b/README.md
index 9ddb773..e2f934e 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# Spatial decomposition
+# Spatial Decomposition
file_single_cell
- comp_process_dataset-->file_spatial_masked
comp_process_dataset-->file_solution
+ comp_process_dataset-->file_spatial_masked
file_single_cell---comp_control_method
file_single_cell---comp_method
- file_spatial_masked---comp_control_method
- file_spatial_masked---comp_method
file_solution---comp_control_method
file_solution---comp_metric
+ file_spatial_masked---comp_control_method
+ file_spatial_masked---comp_method
comp_control_method-->file_output
comp_method-->file_output
comp_metric-->file_score
@@ -98,46 +99,43 @@ Format:
-Slot description:
+Data structure:
-| Slot | Type | Description |
-|:-----------------------------|:----------|:--------------------------------------------------------------------------------------------------------------------|
-| `obs["cell_type"]` | `string` | Cell type label IDs. |
-| `obs["batch"]` | `string` | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
-| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
-| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
-| `obsm["X_pca"]` | `double` | (*Optional*) The resulting PCA embedding. |
-| `layers["counts"]` | `integer` | Raw counts. |
-| `uns["cell_type_names"]` | `string` | (*Optional*) Cell type names corresponding to values in `cell_type`. |
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-| `uns["dataset_name"]` | `string` | Nicely formatted name. |
-| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
-| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
-| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
-| `uns["dataset_description"]` | `string` | Long description of the dataset. |
-| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obs["cell_type"]` | `string` | Cell type label IDs. |
+| `obs["batch"]` | `string` | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
+| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
+| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
+| `obsm["X_pca"]` | `double` | (*Optional*) The resulting PCA embedding. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["cell_type_names"]` | `string` | (*Optional*) Cell type names corresponding to values in `cell_type`. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
+| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_description"]` | `string` | Long description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
## Component type: Data processor
-Path:
-[`src/process_dataset`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/process_dataset)
-
A spatial decomposition dataset processor.
Arguments:
-| Name | Type | Description |
-|:--------------------------|:-------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `--input` | `file` | A subset of the common dataset. |
-| `--output_single_cell` | `file` | (*Output*) The single-cell data file used as reference for the spatial data. |
-| `--output_spatial_masked` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. |
-| `--output_solution` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
+| Name | Type | Description |
+|:---|:---|:---|
+| `--input` | `file` | A subset of the common dataset. |
+| `--output_single_cell` | `file` | (*Output*) The single-cell data file used as reference for the spatial data. |
+| `--output_spatial_masked` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. |
+| `--output_solution` | `file` | (*Output*) The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
@@ -146,7 +144,7 @@ Arguments:
The single-cell data file used as reference for the spatial data
Example file:
-`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad`
+`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/single_cell_ref.h5ad`
Format:
@@ -159,148 +157,139 @@ Format:
-Slot description:
+Data structure:
-| Slot | Type | Description |
-|:-------------------------|:----------|:---------------------------------------------------------------------------------------------------------------------------------|
-| `obs["cell_type"]` | `string` | Cell type label IDs. |
-| `obs["batch"]` | `string` | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
-| `layers["counts"]` | `integer` | Raw counts. |
-| `uns["cell_type_names"]` | `string` | Cell type names corresponding to values in `cell_type`. |
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obs["cell_type"]` | `string` | Cell type label IDs. |
+| `obs["batch"]` | `string` | (*Optional*) A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["cell_type_names"]` | `string` | Cell type names corresponding to values in `cell_type`. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-## File format: Spatial masked
+## File format: Solution
The spatial data file containing transcription profiles for each capture
-location, without cell-type proportions for each spot.
+location, with true cell-type proportions for each spot / capture
+location.
Example file:
-`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad`
+`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad`
Format:
AnnData object
- obsm: 'coordinates'
+ obsm: 'spatial', 'proportions_true'
layers: 'counts'
- uns: 'cell_type_names', 'dataset_id'
+ uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
-Slot description:
+Data structure:
-| Slot | Type | Description |
-|:-------------------------|:----------|:--------------------------------------------------------------------------|
-| `obsm["coordinates"]` | `double` | XY coordinates for each spot. |
-| `layers["counts"]` | `integer` | Raw counts. |
-| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions_pred` in output. |
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obsm["spatial"]` | `double` | XY coordinates for each spot. |
+| `obsm["proportions_true"]` | `double` | True cell type proportions for each spot. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
+| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_description"]` | `string` | Long description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+| `uns["normalization_id"]` | `string` | Which normalization was used. |
-## File format: Solution
+## File format: Spatial masked
The spatial data file containing transcription profiles for each capture
-location, with true cell-type proportions for each spot / capture
-location.
+location, without cell-type proportions for each spot.
Example file:
-`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/solution.h5ad`
+`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/spatial_masked.h5ad`
Format:
AnnData object
- obsm: 'coordinates', 'proportions_true'
+ obsm: 'spatial'
layers: 'counts'
- uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism', 'normalization_id'
+ uns: 'cell_type_names', 'dataset_id'
-Slot description:
+Data structure:
-| Slot | Type | Description |
-|:-----------------------------|:----------|:-------------------------------------------------------------------------------|
-| `obsm["coordinates"]` | `double` | XY coordinates for each spot. |
-| `obsm["proportions_true"]` | `double` | True cell type proportions for each spot. |
-| `layers["counts"]` | `integer` | Raw counts. |
-| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. |
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-| `uns["dataset_name"]` | `string` | Nicely formatted name. |
-| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
-| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
-| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
-| `uns["dataset_description"]` | `string` | Long description of the dataset. |
-| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
-| `uns["normalization_id"]` | `string` | Which normalization was used. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obsm["spatial"]` | `double` | XY coordinates for each spot. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions_pred` in output. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
## Component type: Control method
-Path:
-[`src/control_methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/control_methods)
-
Quality control methods for verifying the pipeline.
Arguments:
-| Name | Type | Description |
-|:-------------------------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|
-| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. |
-| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. |
-| `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
-| `--output` | `file` | (*Output*) Spatial data with estimated proportions. |
+| Name | Type | Description |
+|:---|:---|:---|
+| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. |
+| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. |
+| `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
+| `--output` | `file` | (*Output*) Spatial data with estimated proportions. |
## Component type: Method
-Path:
-[`src/methods`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/methods)
-
A spatial composition method.
Arguments:
-| Name | Type | Description |
-|:-------------------------|:-------|:--------------------------------------------------------------------------------------------------------------------------------|
-| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. |
+| Name | Type | Description |
+|:---|:---|:---|
+| `--input_single_cell` | `file` | The single-cell data file used as reference for the spatial data. |
| `--input_spatial_masked` | `file` | The spatial data file containing transcription profiles for each capture location, without cell-type proportions for each spot. |
-| `--output` | `file` | (*Output*) Spatial data with estimated proportions. |
+| `--output` | `file` | (*Output*) Spatial data with estimated proportions. |
## Component type: Metric
-Path:
-[`src/metrics`](https://github.com/openproblems-bio/openproblems-v2/tree/main/src/metrics)
-
A spatial decomposition metric.
Arguments:
-| Name | Type | Description |
-|:-------------------|:-------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|
-| `--input_method` | `file` | Spatial data with estimated proportions. |
+| Name | Type | Description |
+|:---|:---|:---|
+| `--input_method` | `file` | Spatial data with estimated proportions. |
| `--input_solution` | `file` | The spatial data file containing transcription profiles for each capture location, with true cell-type proportions for each spot / capture location. |
-| `--output` | `file` | (*Output*) Metric score file. |
+| `--output` | `file` | (*Output*) Metric score file. |
@@ -309,35 +298,31 @@ Arguments:
Spatial data with estimated proportions.
Example file:
-`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad`
-
-Description:
-
-Spatial data file with estimated cell type proportions.
+`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/output.h5ad`
Format:
AnnData object
- obsm: 'coordinates', 'proportions_pred'
+ obsm: 'spatial', 'proportions_pred'
layers: 'counts'
uns: 'cell_type_names', 'dataset_id', 'method_id'
-Slot description:
+Data structure:
-| Slot | Type | Description |
-|:---------------------------|:----------|:-----------------------------------------------------------|
-| `obsm["coordinates"]` | `double` | XY coordinates for each spot. |
-| `obsm["proportions_pred"]` | `double` | Estimated cell type proportions for each spot. |
-| `layers["counts"]` | `integer` | Raw counts. |
-| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. |
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-| `uns["method_id"]` | `string` | A unique identifier for the method. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obsm["spatial"]` | `double` | XY coordinates for each spot. |
+| `obsm["proportions_pred"]` | `double` | Estimated cell type proportions for each spot. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["cell_type_names"]` | `string` | Cell type names corresponding to columns of `proportions`. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["method_id"]` | `string` | A unique identifier for the method. |
@@ -346,7 +331,7 @@ Slot description:
Metric score file.
Example file:
-`resources_test/spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad`
+`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/score.h5ad`
Format:
@@ -357,16 +342,61 @@ Format:
-Slot description:
+Data structure:
-| Slot | Type | Description |
-|:-----------------------|:---------|:---------------------------------------------------------------------------------------------|
-| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
-| `uns["method_id"]` | `string` | A unique identifier for the method. |
-| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
+| Slot | Type | Description |
+|:---|:---|:---|
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["method_id"]` | `string` | A unique identifier for the method. |
+| `uns["metric_ids"]` | `string` | One or more unique metric identifiers. |
| `uns["metric_values"]` | `double` | The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’. |
+## File format: Common Dataset
+
+A subset of the common dataset.
+
+Example file:
+`resources_test/task_spatial_decomposition/cxg_mouse_pancreas_atlas/simulated_dataset.h5ad`
+
+Format:
+
+
+
+ AnnData object
+ obs: 'cell_type', 'batch'
+ var: 'hvg', 'hvg_score'
+ obsm: 'X_pca', 'spatial', 'proportions_true'
+ layers: 'counts'
+ uns: 'cell_type_names', 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism'
+
+
+
+Data structure:
+
+
+
+| Slot | Type | Description |
+|:---|:---|:---|
+| `obs["cell_type"]` | `string` | Cell type label IDs. |
+| `obs["batch"]` | `string` | A batch identifier. This label is very context-dependent and may be a combination of the tissue, assay, donor, etc. |
+| `var["hvg"]` | `boolean` | Whether or not the feature is considered to be a ‘highly variable gene’. |
+| `var["hvg_score"]` | `double` | A ranking of the features by hvg. |
+| `obsm["X_pca"]` | `double` | The resulting PCA embedding. |
+| `obsm["spatial"]` | `double` | (*Optional*) XY coordinates for each spot. |
+| `obsm["proportions_true"]` | `double` | (*Optional*) True cell type proportions for each spot. |
+| `layers["counts"]` | `integer` | Raw counts. |
+| `uns["cell_type_names"]` | `string` | (*Optional*) Cell type names corresponding to values in `cell_type`. |
+| `uns["dataset_id"]` | `string` | A unique identifier for the dataset. |
+| `uns["dataset_name"]` | `string` | Nicely formatted name. |
+| `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. |
+| `uns["dataset_reference"]` | `string` | (*Optional*) Bibtex reference of the paper in which the dataset was published. |
+| `uns["dataset_summary"]` | `string` | Short description of the dataset. |
+| `uns["dataset_description"]` | `string` | Long description of the dataset. |
+| `uns["dataset_organism"]` | `string` | (*Optional*) The organism of the sample in the dataset. |
+
+
+