diff --git a/README.md b/README.md index 573e183d..4fdc7d86 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,17 @@ should convince readers of the significance and relevance of your task. ``` mermaid flowchart LR - comp_data_loader_sc[/"SC Data Loader"/] + comp_cell_volume_method[/"Cell Volume Calculation"/] + file_cell_volumes("Cell Volumes") + file_spatialdata_assigned("Assigned Transcripts") + comp_count_aggregation[/"Count Aggregation"/] + file_spatial_raw_counts("Spatial Raw Counts") + comp_normalisation_method[/"Normalisation"/] + file_spatial_norm_counts("Spatial Normalised Counts") + comp_celltype_annotation_method[/"Cell type annotation"/] + file_spatial_with_celltypes("Spatial Normalised Counts with Cell Type Annotations") + comp_expr_correction_method[/"Expression correction"/] + file_spatial_corrected("Spatial Corrected Counts with Cell Type Annotations") file_common_singlecell("Common SC Dataset") comp_data_preprocessor[/"Data preprocessor"/] file_singlecell("SC Dataset") @@ -44,10 +54,19 @@ flowchart LR comp_segmentation_method[/"Segmentation"/] file_spatialdata_segmented("Segmented iST") comp_assignment_method[/"Assignment"/] - file_spatialdata_assigned("Assigned Transcripts") file_common_spatialdata("Common iST Dataset") + comp_data_loader_sc[/"SC Data Loader"/] comp_data_loader_sp[/"iST Data Loader"/] - comp_data_loader_sc-->file_common_singlecell + comp_cell_volume_method-->file_cell_volumes + comp_cell_volume_method-->file_spatialdata_assigned + file_spatialdata_assigned---comp_count_aggregation + comp_count_aggregation-->file_spatial_raw_counts + file_spatial_raw_counts---comp_normalisation_method + comp_normalisation_method-->file_spatial_norm_counts + file_spatial_norm_counts---comp_celltype_annotation_method + comp_celltype_annotation_method-->file_spatial_with_celltypes + file_spatial_with_celltypes---comp_expr_correction_method + comp_expr_correction_method-->file_spatial_corrected file_common_singlecell---comp_data_preprocessor comp_data_preprocessor-->file_singlecell comp_data_preprocessor-->file_spatialdata @@ -56,12 +75,13 @@ flowchart LR file_spatialdata_segmented---comp_assignment_method comp_assignment_method-->file_spatialdata_assigned file_common_spatialdata---comp_data_preprocessor + comp_data_loader_sc-->file_common_singlecell comp_data_loader_sp-->file_common_spatialdata ``` -## Component type: SC Data Loader +## Component type: Cell Volume Calculation -A component to download and store single-cell data. +Calculate the volume of cells Arguments: @@ -69,14 +89,303 @@ Arguments: | Name | Type | Description | |:---|:---|:---| -| `--output` | `file` | (*Output*) An unprocessed dataset as output by a dataset loader. | -| `--dataset_id` | `string` | NA. | -| `--dataset_name` | `string` | NA. | -| `--dataset_url` | `string` | (*Optional*) NA. | -| `--dataset_reference` | `string` | (*Optional*) NA. | -| `--dataset_summary` | `string` | NA. | -| `--dataset_description` | `string` | NA. | -| `--dataset_organism` | `string` | (*Optional*) NA. | +| `--input` | `file` | (*Output*) A spatial transcriptomics dataset with assigned transcripts. | +| `--output` | `file` | (*Output*) An obs column of cell volumes calculated from spatial transcriptomics data. | + + + +## File format: Cell Volumes + +An obs column of cell volumes calculated from spatial transcriptomics +data. + +Example file: +`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` + +Description: + +An obs column of cell volumes calculated from spatial transcriptomics +data. + +Format: + +
+ + AnnData object + obs: 'volume' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:----------------|:---------|:------------------------| +| `obs["volume"]` | `string` | The volume of the cell. | + +
+ +## File format: Assigned Transcripts + +A spatial transcriptomics dataset with assigned transcripts + +Example file: `...` + +Description: + +… + +Format: + +
+ +
+ +Data structure: + +
+ +
+ +## Component type: Count Aggregation + +Aggregating counts of transcripts within cells + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input` | `file` | A spatial transcriptomics dataset with assigned transcripts. | +| `--output` | `file` | (*Output*) Unprocessed raw counts after aggregation of transcripts to cells. | + +
+ +## File format: Spatial Raw Counts + +Unprocessed raw counts after aggregation of transcripts to cells + +Example file: +`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` + +Description: + +This file contains the raw counts after aggregating transcripts to +cells. + +Format: + +
+ + AnnData object + obs: 'cell_id', 'centroid_x', 'centroid_y', 'centroid_z', 'n_counts', 'n_genes' + var: 'gene_name', 'n_counts', 'n_cells' + layers: 'raw' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_id"]` | `string` | Unique identifier for the cell (from assignment step). | +| `obs["centroid_x"]` | `string` | X coordinate of the cell. | +| `obs["centroid_y"]` | `string` | Y coordinate of the cell. | +| `obs["centroid_z"]` | `string` | (*Optional*) Z coordinate of the cell. | +| `obs["n_counts"]` | `string` | Number of counts in the cell. | +| `obs["n_genes"]` | `string` | Number of genes in the cell. | +| `var["gene_name"]` | `string` | Name of the gene. | +| `var["n_counts"]` | `string` | Number of counts of the gene. | +| `var["n_cells"]` | `string` | Number of cells expressing the gene. | +| `layers["raw"]` | `integer` | Raw counts. | + +
+ +## Component type: Normalisation + +Normalising spatial transcriptomics data + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input` | `file` | Unprocessed raw counts after aggregation of transcripts to cells. | +| `--output` | `file` | (*Output*) Normalised counts. | + +
+ +## File format: Spatial Normalised Counts + +Normalised counts + +Example file: +`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` + +Description: + +This file contains the normalised counts of the spatial transcriptomics +data. + +Format: + +
+ + AnnData object + obs: 'cell_id', 'centroid_x', 'centroid_y', 'centroid_z', 'n_counts', 'n_genes', 'volume' + var: 'gene_name', 'n_counts', 'n_cells' + layers: 'raw', 'norm', 'lognorm' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_id"]` | `string` | Unique identifier for the cell (from assignment step). | +| `obs["centroid_x"]` | `string` | X coordinate of the cell. | +| `obs["centroid_y"]` | `string` | Y coordinate of the cell. | +| `obs["centroid_z"]` | `string` | (*Optional*) Z coordinate of the cell. | +| `obs["n_counts"]` | `string` | Number of counts in the cell. | +| `obs["n_genes"]` | `string` | Number of genes in the cell. | +| `obs["volume"]` | `string` | Volume of the cell. | +| `var["gene_name"]` | `string` | Name of the gene. | +| `var["n_counts"]` | `string` | Number of counts of the gene. | +| `var["n_cells"]` | `string` | Number of cells expressing the gene. | +| `layers["raw"]` | `integer` | Raw counts. | +| `layers["norm"]` | `integer` | Normalised counts. | +| `layers["lognorm"]` | `integer` | Log normalised counts. | + +
+ +## Component type: Cell type annotation + +Annotating cell types in spatial data + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input` | `file` | Normalised counts. | +| `--celltype_key` | `string` | (*Optional*) NA. Default: `cell_type`. | +| `--output` | `file` | (*Output*) Normalised counts with cell type annotations. | + +
+ +## File format: Spatial Normalised Counts with Cell Type Annotations + +Normalised counts with cell type annotations + +Example file: +`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` + +Description: + +This file contains the normalised counts of the spatial transcriptomics +data and cell type annotations. + +Format: + +
+ + AnnData object + obs: 'cell_id', 'centroid_x', 'centroid_y', 'centroid_z', 'n_counts', 'n_genes', 'volume', 'cell_type' + var: 'gene_name', 'n_counts', 'n_cells' + layers: 'raw', 'norm', 'lognorm' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_id"]` | `string` | Unique identifier for the cell (from assignment step). | +| `obs["centroid_x"]` | `string` | X coordinate of the cell. | +| `obs["centroid_y"]` | `string` | Y coordinate of the cell. | +| `obs["centroid_z"]` | `string` | (*Optional*) Z coordinate of the cell. | +| `obs["n_counts"]` | `string` | Number of counts in the cell. | +| `obs["n_genes"]` | `string` | Number of genes in the cell. | +| `obs["volume"]` | `string` | Volume of the cell. | +| `obs["cell_type"]` | `string` | Cell type of the cell. | +| `var["gene_name"]` | `string` | Name of the gene. | +| `var["n_counts"]` | `string` | Number of counts of the gene. | +| `var["n_cells"]` | `string` | Number of cells expressing the gene. | +| `layers["raw"]` | `integer` | Raw counts. | +| `layers["norm"]` | `integer` | Normalised counts. | +| `layers["lognorm"]` | `integer` | Log normalised counts. | + +
+ +## Component type: Expression correction + +Correcting expression levels in spatial data + +Arguments: + +
+ +| Name | Type | Description | +|:-----------|:-------|:--------------------------------------------------------| +| `--input` | `file` | Normalised counts with cell type annotations. | +| `--output` | `file` | (*Output*) Corrected counts with cell type annotations. | + +
+ +## File format: Spatial Corrected Counts with Cell Type Annotations + +Corrected counts with cell type annotations + +Example file: +`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` + +Description: + +This file contains the corrected counts of the spatial transcriptomics +data and cell type annotations. + +Format: + +
+ + AnnData object + obs: 'cell_id', 'centroid_x', 'centroid_y', 'centroid_z', 'n_counts', 'n_genes', 'volume', 'cell_type' + var: 'gene_name', 'n_counts', 'n_cells' + layers: 'raw', 'norm', 'lognorm', 'lognorm_uncorrected' + +
+ +Data structure: + +
+ +| Slot | Type | Description | +|:---|:---|:---| +| `obs["cell_id"]` | `string` | Unique identifier for the cell (from assignment step). | +| `obs["centroid_x"]` | `string` | X coordinate of the cell. | +| `obs["centroid_y"]` | `string` | Y coordinate of the cell. | +| `obs["centroid_z"]` | `string` | (*Optional*) Z coordinate of the cell. | +| `obs["n_counts"]` | `string` | Number of counts in the cell. | +| `obs["n_genes"]` | `string` | Number of genes in the cell. | +| `obs["volume"]` | `string` | Volume of the cell. | +| `obs["cell_type"]` | `string` | Cell type of the cell. | +| `var["gene_name"]` | `string` | Name of the gene. | +| `var["n_counts"]` | `string` | Number of counts of the gene. | +| `var["n_cells"]` | `string` | Number of cells expressing the gene. | +| `layers["raw"]` | `integer` | Raw counts. | +| `layers["norm"]` | `integer` | Normalised counts. | +| `layers["lognorm"]` | `integer` | Log normalised counts. | +| `layers["lognorm_uncorrected"]` | `integer` | (*Optional*) Log normalised counts. |
@@ -190,7 +499,7 @@ Format: AnnData object obs: 'cell_type', 'cell_type_level2', 'cell_type_level3', 'cell_type_level4', 'dataset_id', 'assay', 'assay_ontology_term_id', 'cell_type_ontology_term_id', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor_id', 'is_primary_data', 'organism', 'organism_ontology_term_id', 'self_reported_ethnicity', 'self_reported_ethnicity_ontology_term_id', 'sex', 'sex_ontology_term_id', 'suspension_type', 'tissue', 'tissue_ontology_term_id', 'tissue_general', 'tissue_general_ontology_term_id', 'batch', 'soma_joinid' var: 'feature_id', 'feature_name', 'soma_joinid' - layers: 'counts' + layers: 'counts', 'raw', 'norm', 'lognorm' uns: 'dataset_id', 'dataset_name', 'dataset_url', 'dataset_reference', 'dataset_summary', 'dataset_description', 'dataset_organism' @@ -232,6 +541,9 @@ Data structure: | `var["feature_name"]` | `string` | A human-readable name for the feature, usually a gene symbol. | | `var["soma_joinid"]` | `integer` | (*Optional*) If the dataset was retrieved from CELLxGENE census, this is a unique identifier for the feature. | | `layers["counts"]` | `integer` | Raw counts. | +| `layers["raw"]` | `integer` | Raw counts. | +| `layers["norm"]` | `integer` | Normalised counts. | +| `layers["lognorm"]` | `integer` | Log normalised counts. | | `uns["dataset_id"]` | `string` | A unique identifier for the dataset. This is different from the `obs.dataset_id` field, which is the identifier for the dataset from which the cell data is derived. | | `uns["dataset_name"]` | `string` | A human-readable name for the dataset. | | `uns["dataset_url"]` | `string` | (*Optional*) Link to the original source of the dataset. | @@ -318,15 +630,17 @@ Arguments: -## File format: Assigned Transcripts +## File format: Common iST Dataset -A spatial transcriptomics dataset with assigned transcripts +An unprocessed spatial imaging dataset stored as a zarr file. -Example file: `...` +Example file: +`resources_test/common/2023_10x_mouse_brain_xenium/dataset.zarr` Description: -… +This dataset contains raw images, labels, points, shapes, and tables as +output by a dataset loader. Format: @@ -340,27 +654,24 @@ Data structure: -## File format: Common iST Dataset - -An unprocessed spatial imaging dataset stored as a zarr file. - -Example file: -`resources_test/common/2023_10x_mouse_brain_xenium/dataset.zarr` - -Description: +## Component type: SC Data Loader -This dataset contains raw images, labels, points, shapes, and tables as -output by a dataset loader. +A component to download and store single-cell data. -Format: +Arguments:
-
- -Data structure: - -
+| Name | Type | Description | +|:---|:---|:---| +| `--output` | `file` | (*Output*) An unprocessed dataset as output by a dataset loader. | +| `--dataset_id` | `string` | NA. | +| `--dataset_name` | `string` | NA. | +| `--dataset_url` | `string` | (*Optional*) NA. | +| `--dataset_reference` | `string` | (*Optional*) NA. | +| `--dataset_summary` | `string` | NA. | +| `--dataset_description` | `string` | NA. | +| `--dataset_organism` | `string` | (*Optional*) NA. |