adding correlation functions

labsyspharm · Mar 18, 2024 · 27b5d86 · 27b5d86
1 parent f2c55fa
commit 27b5d86
Show file tree

Hide file tree

Showing 13 changed files with 955 additions and 191 deletions.
diff --git a/docs/Functions/pl/groupCorrelation.md b/docs/Functions/pl/groupCorrelation.md
@@ -0,0 +1,5 @@
+---
+hide:
+  - toc        # Hide table of contents
+---
+::: scimap.plotting.groupCorrelation
diff --git a/docs/Functions/pl/markerCorrelation.md b/docs/Functions/pl/markerCorrelation.md
@@ -0,0 +1,5 @@
+---
+hide:
+  - toc        # Hide table of contents
+---
+::: scimap.plotting.markerCorrelation
diff --git a/docs/tutorials/nbs/Prepare Data for SCIMAP.ipynb b/docs/tutorials/nbs/Prepare Data for SCIMAP.ipynb
@@ -10,10 +10,28 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 1,
    "id": "dee3edb2-9621-42fe-8244-111e34945b91",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Running SCIMAP  1.3.8\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning:\n",
+      "\n",
+      "IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "\n"
+     ]
+    }
+   ],
    "source": [
     "# import scimap\n",
     "import scimap as sm"
@@ -31,7 +49,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 2,
    "id": "e26cfe65-4bf3-4558-85e8-2d4010b2110f",
    "metadata": {},
    "outputs": [
@@ -61,7 +79,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 3,
    "id": "f10e382b-02b7-40ab-9e3f-4dfd8b1ba0a5",
    "metadata": {},
    "outputs": [
@@ -70,10 +88,11 @@
       "text/plain": [
        "AnnData object with n_obs × n_vars = 11201 × 9\n",
        "    obs: 'X_centroid', 'Y_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Solidity', 'Extent', 'Orientation', 'CellID', 'imageid'\n",
-       "    uns: 'all_markers'"
+       "    uns: 'all_markers'\n",
+       "    layers: 'log'"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -87,7 +106,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 4,
    "id": "d8713803-ab13-4b31-b693-efedd4ccdae8",
    "metadata": {},
    "outputs": [
@@ -109,7 +128,7 @@
        "        6.73978032]])"
       ]
      },
-     "execution_count": 8,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -122,7 +141,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 5,
    "id": "527582cb-7796-45f3-81eb-bccae5ef2e8a",
    "metadata": {},
    "outputs": [
@@ -389,7 +408,7 @@
        "[11201 rows x 11 columns]"
       ]
      },
-     "execution_count": 9,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -401,7 +420,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 6,
    "id": "ea937485-6871-47a1-a308-c2e1e4b002df",
    "metadata": {},
    "outputs": [
@@ -466,7 +485,7 @@
        "Index: [ELANE, CD57, CD45, CD11B, SMA, CD16, ECAD, FOXP3, NCAM]"
       ]
      },
-     "execution_count": 10,
+     "execution_count": 6,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -502,7 +521,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 7,
    "id": "85510c7b-a68a-40f4-86e4-b98ac54e411a",
    "metadata": {},
    "outputs": [],
@@ -539,115 +558,78 @@
   },
   {
    "cell_type": "markdown",
-   "id": "986db9f2-a6b5-4950-9cb5-202c391d9a82",
+   "id": "1bf35ced-e150-4662-aeb2-2afe35d8aa19",
    "metadata": {},
    "source": [
-    "<hr>"
+    "When manually importing data without using the built-in function that automates the process, it is crucial to follow four essential steps to ensure compatibility and effective data management for further analysis:\n",
+    "\n",
+    "1. **Ensure Unique Image Identification**: Incorporate a column named `imageid` within the metadata to assign a unique identifier to each image, especially when handling datasets comprising multiple images. This facilitates the organization and retrieval of specific image data within a larger dataset.\n",
+    "   \n",
+    "2. **Preserve Raw Data**: Store the unprocessed raw data in `adata.raw`. This practice retains the original state of the data for reference or baseline comparisons before any preprocessing steps are applied.\n",
+    "\n",
+    "3. **Log Transformation Layer**: Generate a layer named `log` to hold log-transformed data. Log transformation is a critical step for normalizing data and mitigating the impact of large-scale differences across measurements, enhancing the analysis's robustness and interpretability.\n",
+    "\n",
+    "4. **Marker Annotation**: Maintain a record of all markers present in the images, ensuring their order matches the layers within the image data. This annotation is instrumental when loading images to precisely identify which layer corresponds to each marker, thus streamlining the analysis process by clarifying the relationship between image layers and their respective biological markers.\n",
+    "\n",
+    "By adhering to these guidelines, researchers can ensure their manually imported datasets are well-organized and primed for comprehensive analysis, leveraging the full capabilities of their analytical platforms."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "e78346dc-3a4f-480b-8be2-345f143f4a50",
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cd360b8b-0d67-417b-abe1-51e3b7141119",
    "metadata": {},
+   "outputs": [],
    "source": [
-    "## Save the annData object"
+    "# preserve raw data\n",
+    "adata.raw = adata\n",
+    "\n",
+    "# log transform data\n",
+    "adata = sm.pp.log1p(adata)\n",
+    "\n",
+    "# Add marker annotation\n",
+    "adata.uns['all_markers'] = ['list', 'of', 'markers']"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a995eb86-1fd8-41ff-a511-d56439450f3a",
-   "metadata": {},
-   "source": [
-    "Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "cc19d2dc-66a3-4a71-817e-71650206e5be",
+   "id": "986db9f2-a6b5-4950-9cb5-202c391d9a82",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "# Save the results\n",
-    "adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')"
+    "<hr>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f59b0d22-0f25-41f4-80fc-2797f3236c7a",
+   "id": "e78346dc-3a4f-480b-8be2-345f143f4a50",
    "metadata": {},
    "source": [
-    "\n",
-    "`sm.tl.cluster` function can be used for clustering cells within the dataset. It supports three popular clustering algorithms:\n",
-    "\n",
-    "- kmeans\n",
-    "- phenograph\n",
-    "- leiden\n",
-    " \n",
-    "Users are encouraged to select the clustering algorithm that best matches their data's nature and their analytical goals."
+    "## Save the annData object"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "75989db3-a550-4f4d-8264-d974f4a048a8",
+   "cell_type": "markdown",
+   "id": "a995eb86-1fd8-41ff-a511-d56439450f3a",
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Leiden clustering\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scanpy/preprocessing/_pca.py:229: ImplicitModificationWarning:\n",
-      "\n",
-      "Setting element `.obsm['X_pca']` of view, initializing view as actual.\n",
-      "\n"
-     ]
-    }
-   ],
    "source": [
-    "adata = sm.tl.cluster(adata, method='leiden', resolution=0.2)"
+    "Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
-   "id": "a0606b96-85a8-470d-8259-8e3689ab2a85",
+   "execution_count": 9,
+   "id": "cc19d2dc-66a3-4a71-817e-71650206e5be",
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "leiden\n",
-       "0    4070\n",
-       "1    2847\n",
-       "2    2658\n",
-       "3    1063\n",
-       "4     482\n",
-       "5      81\n",
-       "Name: count, dtype: int64"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
-    "# view the results\n",
-    "adata.obs['leiden'].value_counts()"
+    "# Save the results\n",
+    "adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "410acf2b-37cf-45f8-b4f9-db477f127d0b",
+   "id": "de8e5905-a2b9-4b2b-a500-450ead7b741e",
    "metadata": {},
    "outputs": [],
    "source": []