Skip to content

Commit

Permalink
adding correlation functions
Browse files Browse the repository at this point in the history
  • Loading branch information
ajitjohnson committed Mar 18, 2024
1 parent f2c55fa commit 27b5d86
Show file tree
Hide file tree
Showing 13 changed files with 955 additions and 191 deletions.
5 changes: 5 additions & 0 deletions docs/Functions/pl/groupCorrelation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
hide:
- toc # Hide table of contents
---
::: scimap.plotting.groupCorrelation
5 changes: 5 additions & 0 deletions docs/Functions/pl/markerCorrelation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
hide:
- toc # Hide table of contents
---
::: scimap.plotting.markerCorrelation
156 changes: 69 additions & 87 deletions docs/tutorials/nbs/Prepare Data for SCIMAP.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,28 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 1,
"id": "dee3edb2-9621-42fe-8244-111e34945b91",
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running SCIMAP 1.3.8\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning:\n",
"\n",
"IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
"\n"
]
}
],
"source": [
"# import scimap\n",
"import scimap as sm"
Expand All @@ -31,7 +49,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 2,
"id": "e26cfe65-4bf3-4558-85e8-2d4010b2110f",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -61,7 +79,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 3,
"id": "f10e382b-02b7-40ab-9e3f-4dfd8b1ba0a5",
"metadata": {},
"outputs": [
Expand All @@ -70,10 +88,11 @@
"text/plain": [
"AnnData object with n_obs × n_vars = 11201 × 9\n",
" obs: 'X_centroid', 'Y_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Solidity', 'Extent', 'Orientation', 'CellID', 'imageid'\n",
" uns: 'all_markers'"
" uns: 'all_markers'\n",
" layers: 'log'"
]
},
"execution_count": 5,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -87,7 +106,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 4,
"id": "d8713803-ab13-4b31-b693-efedd4ccdae8",
"metadata": {},
"outputs": [
Expand All @@ -109,7 +128,7 @@
" 6.73978032]])"
]
},
"execution_count": 8,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -122,7 +141,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 5,
"id": "527582cb-7796-45f3-81eb-bccae5ef2e8a",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -389,7 +408,7 @@
"[11201 rows x 11 columns]"
]
},
"execution_count": 9,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -401,7 +420,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 6,
"id": "ea937485-6871-47a1-a308-c2e1e4b002df",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -466,7 +485,7 @@
"Index: [ELANE, CD57, CD45, CD11B, SMA, CD16, ECAD, FOXP3, NCAM]"
]
},
"execution_count": 10,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -502,7 +521,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 7,
"id": "85510c7b-a68a-40f4-86e4-b98ac54e411a",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -539,115 +558,78 @@
},
{
"cell_type": "markdown",
"id": "986db9f2-a6b5-4950-9cb5-202c391d9a82",
"id": "1bf35ced-e150-4662-aeb2-2afe35d8aa19",
"metadata": {},
"source": [
"<hr>"
"When manually importing data without using the built-in function that automates the process, it is crucial to follow four essential steps to ensure compatibility and effective data management for further analysis:\n",
"\n",
"1. **Ensure Unique Image Identification**: Incorporate a column named `imageid` within the metadata to assign a unique identifier to each image, especially when handling datasets comprising multiple images. This facilitates the organization and retrieval of specific image data within a larger dataset.\n",
" \n",
"2. **Preserve Raw Data**: Store the unprocessed raw data in `adata.raw`. This practice retains the original state of the data for reference or baseline comparisons before any preprocessing steps are applied.\n",
"\n",
"3. **Log Transformation Layer**: Generate a layer named `log` to hold log-transformed data. Log transformation is a critical step for normalizing data and mitigating the impact of large-scale differences across measurements, enhancing the analysis's robustness and interpretability.\n",
"\n",
"4. **Marker Annotation**: Maintain a record of all markers present in the images, ensuring their order matches the layers within the image data. This annotation is instrumental when loading images to precisely identify which layer corresponds to each marker, thus streamlining the analysis process by clarifying the relationship between image layers and their respective biological markers.\n",
"\n",
"By adhering to these guidelines, researchers can ensure their manually imported datasets are well-organized and primed for comprehensive analysis, leveraging the full capabilities of their analytical platforms."
]
},
{
"cell_type": "markdown",
"id": "e78346dc-3a4f-480b-8be2-345f143f4a50",
"cell_type": "code",
"execution_count": null,
"id": "cd360b8b-0d67-417b-abe1-51e3b7141119",
"metadata": {},
"outputs": [],
"source": [
"## Save the annData object"
"# preserve raw data\n",
"adata.raw = adata\n",
"\n",
"# log transform data\n",
"adata = sm.pp.log1p(adata)\n",
"\n",
"# Add marker annotation\n",
"adata.uns['all_markers'] = ['list', 'of', 'markers']"
]
},
{
"cell_type": "markdown",
"id": "a995eb86-1fd8-41ff-a511-d56439450f3a",
"metadata": {},
"source": [
"Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "cc19d2dc-66a3-4a71-817e-71650206e5be",
"id": "986db9f2-a6b5-4950-9cb5-202c391d9a82",
"metadata": {},
"outputs": [],
"source": [
"# Save the results\n",
"adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')"
"<hr>"
]
},
{
"cell_type": "markdown",
"id": "f59b0d22-0f25-41f4-80fc-2797f3236c7a",
"id": "e78346dc-3a4f-480b-8be2-345f143f4a50",
"metadata": {},
"source": [
"\n",
"`sm.tl.cluster` function can be used for clustering cells within the dataset. It supports three popular clustering algorithms:\n",
"\n",
"- kmeans\n",
"- phenograph\n",
"- leiden\n",
" \n",
"Users are encouraged to select the clustering algorithm that best matches their data's nature and their analytical goals."
"## Save the annData object"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "75989db3-a550-4f4d-8264-d974f4a048a8",
"cell_type": "markdown",
"id": "a995eb86-1fd8-41ff-a511-d56439450f3a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Leiden clustering\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scanpy/preprocessing/_pca.py:229: ImplicitModificationWarning:\n",
"\n",
"Setting element `.obsm['X_pca']` of view, initializing view as actual.\n",
"\n"
]
}
],
"source": [
"adata = sm.tl.cluster(adata, method='leiden', resolution=0.2)"
"Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a0606b96-85a8-470d-8259-8e3689ab2a85",
"execution_count": 9,
"id": "cc19d2dc-66a3-4a71-817e-71650206e5be",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"leiden\n",
"0 4070\n",
"1 2847\n",
"2 2658\n",
"3 1063\n",
"4 482\n",
"5 81\n",
"Name: count, dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"# view the results\n",
"adata.obs['leiden'].value_counts()"
"# Save the results\n",
"adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "410acf2b-37cf-45f8-b4f9-db477f127d0b",
"id": "de8e5905-a2b9-4b2b-a500-450ead7b741e",
"metadata": {},
"outputs": [],
"source": []
Expand Down
Loading

0 comments on commit 27b5d86

Please sign in to comment.