From 89a523837fed63d016ae3a87862cfa0e4cf88b9c Mon Sep 17 00:00:00 2001
From: John Kerl <kerl.john.r@gmail.com>
Date: Tue, 5 Nov 2024 14:01:44 -0500
Subject: [PATCH] [python] Tutorial notebook for new shape feature

---
 .../notebooks/tutorial_soma_append_mode.ipynb |    2 +-
 .../notebooks/tutorial_soma_shape.ipynb       | 1117 +++++++++++++++++
 2 files changed, 1118 insertions(+), 1 deletion(-)
 create mode 100644 apis/python/notebooks/tutorial_soma_shape.ipynb
diff --git a/apis/python/notebooks/tutorial_soma_append_mode.ipynb b/apis/python/notebooks/tutorial_soma_append_mode.ipynb
index 3ab234ecba..622d376712 100644
--- a/apis/python/notebooks/tutorial_soma_append_mode.ipynb
+++ b/apis/python/notebooks/tutorial_soma_append_mode.ipynb
@@ -1105,7 +1105,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.11.6"
   }
  },
  "nbformat": 4,
diff --git a/apis/python/notebooks/tutorial_soma_shape.ipynb b/apis/python/notebooks/tutorial_soma_shape.ipynb
new file mode 100644
index 0000000000..d10d1da2e8
--- /dev/null
+++ b/apis/python/notebooks/tutorial_soma_shape.ipynb
@@ -0,0 +1,1117 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2cf7c05c-f723-489d-8c39-3e2841f655b0",
+   "metadata": {},
+   "source": [
+    "# Tutorial: SOMA shapes\n",
+    "\n",
+    "As of TileDB-SOMA we're proud to support a more intutive and extensible notion of `shape`.\n",
+    "\n",
+    "In this notebook, we'll go through how you use shapes for the dataframes and arrays within your SOMA experiments, when and how you can resize, and options for experiments created before TileDB-SOMA 1.15.\n",
+    "\n",
+    "The dataset used is from Peripheral Blood Mononuclear Cells (PBMC), which is freely available from 10X Genomics. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "167dba53-7da6-4984-bbe7-a5416e60325d",
+   "metadata": {},
+   "source": [
+    "We'll start by importing `tiledbsoma`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "90db6017-a084-43f5-8f7e-bff281e9a898",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import tiledbsoma"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca9f7272-09e0-4eda-a569-8796a14bf776",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## The shape feature"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41358011-b835-4c3a-a75e-79a80f4cc3a1",
+   "metadata": {},
+   "source": [
+    "As we've seen in other tutorials in this series, the SOMA data model brings across many familiar concepts from AnnData. This includes the ability to ask component dataframes and arrays what their shapes are."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c35267c3-21a3-4938-afb6-92110326448b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "exp = tiledbsoma.open(\"data/sparse/pbmc3k\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d934ed9-5b41-4af8-a737-23583f6e885b",
+   "metadata": {},
+   "source": [
+    "The `obs` dataframe has a shape, which coincides with the data populated inside of it.\n",
+    "\n",
+    "(It might not, if you've created the dataframe but not written any data to it yet -- at that point it's empty but it still has a shape.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "040d0629-815a-4fa3-a586-3ffd2c4ba451",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "2638"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "exp.obs.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9967e115-6277-4203-b61b-96d1c5b04fde",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>soma_joinid</th>\n",
+       "      <th>obs_id</th>\n",
+       "      <th>n_genes</th>\n",
+       "      <th>percent_mito</th>\n",
+       "      <th>n_counts</th>\n",
+       "      <th>louvain</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>AAACATACAACCAC-1</td>\n",
+       "      <td>781</td>\n",
+       "      <td>0.030178</td>\n",
+       "      <td>2419.0</td>\n",
+       "      <td>CD4 T cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>1</td>\n",
+       "      <td>AAACATTGAGCTAC-1</td>\n",
+       "      <td>1352</td>\n",
+       "      <td>0.037936</td>\n",
+       "      <td>4903.0</td>\n",
+       "      <td>B cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2</td>\n",
+       "      <td>AAACATTGATCAGC-1</td>\n",
+       "      <td>1131</td>\n",
+       "      <td>0.008897</td>\n",
+       "      <td>3147.0</td>\n",
+       "      <td>CD4 T cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>3</td>\n",
+       "      <td>AAACCGTGCTTCCG-1</td>\n",
+       "      <td>960</td>\n",
+       "      <td>0.017431</td>\n",
+       "      <td>2639.0</td>\n",
+       "      <td>CD14+ Monocytes</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>4</td>\n",
+       "      <td>AAACCGTGTATGCG-1</td>\n",
+       "      <td>522</td>\n",
+       "      <td>0.012245</td>\n",
+       "      <td>980.0</td>\n",
+       "      <td>NK cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>...</th>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "      <td>...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2633</th>\n",
+       "      <td>2633</td>\n",
+       "      <td>TTTCGAACTCTCAT-1</td>\n",
+       "      <td>1155</td>\n",
+       "      <td>0.021104</td>\n",
+       "      <td>3459.0</td>\n",
+       "      <td>CD14+ Monocytes</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2634</th>\n",
+       "      <td>2634</td>\n",
+       "      <td>TTTCTACTGAGGCA-1</td>\n",
+       "      <td>1227</td>\n",
+       "      <td>0.009294</td>\n",
+       "      <td>3443.0</td>\n",
+       "      <td>B cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2635</th>\n",
+       "      <td>2635</td>\n",
+       "      <td>TTTCTACTTCCTCG-1</td>\n",
+       "      <td>622</td>\n",
+       "      <td>0.021971</td>\n",
+       "      <td>1684.0</td>\n",
+       "      <td>B cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2636</th>\n",
+       "      <td>2636</td>\n",
+       "      <td>TTTGCATGAGAGGC-1</td>\n",
+       "      <td>454</td>\n",
+       "      <td>0.020548</td>\n",
+       "      <td>1022.0</td>\n",
+       "      <td>B cells</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2637</th>\n",
+       "      <td>2637</td>\n",
+       "      <td>TTTGCATGCCTCAC-1</td>\n",
+       "      <td>724</td>\n",
+       "      <td>0.008065</td>\n",
+       "      <td>1984.0</td>\n",
+       "      <td>CD4 T cells</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>2638 rows × 6 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "      soma_joinid            obs_id  n_genes  percent_mito  n_counts  \\\n",
+       "0               0  AAACATACAACCAC-1      781      0.030178    2419.0   \n",
+       "1               1  AAACATTGAGCTAC-1     1352      0.037936    4903.0   \n",
+       "2               2  AAACATTGATCAGC-1     1131      0.008897    3147.0   \n",
+       "3               3  AAACCGTGCTTCCG-1      960      0.017431    2639.0   \n",
+       "4               4  AAACCGTGTATGCG-1      522      0.012245     980.0   \n",
+       "...           ...               ...      ...           ...       ...   \n",
+       "2633         2633  TTTCGAACTCTCAT-1     1155      0.021104    3459.0   \n",
+       "2634         2634  TTTCTACTGAGGCA-1     1227      0.009294    3443.0   \n",
+       "2635         2635  TTTCTACTTCCTCG-1      622      0.021971    1684.0   \n",
+       "2636         2636  TTTGCATGAGAGGC-1      454      0.020548    1022.0   \n",
+       "2637         2637  TTTGCATGCCTCAC-1      724      0.008065    1984.0   \n",
+       "\n",
+       "              louvain  \n",
+       "0         CD4 T cells  \n",
+       "1             B cells  \n",
+       "2         CD4 T cells  \n",
+       "3     CD14+ Monocytes  \n",
+       "4            NK cells  \n",
+       "...               ...  \n",
+       "2633  CD14+ Monocytes  \n",
+       "2634          B cells  \n",
+       "2635          B cells  \n",
+       "2636          B cells  \n",
+       "2637      CD4 T cells  \n",
+       "\n",
+       "[2638 rows x 6 columns]"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "exp.obs.read().concat().to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4ac5267-3d09-4c0d-aee7-538ab616e7ac",
+   "metadata": {},
+   "source": [
+    "The purpose of the shape is to serve as a _soft limit_. It means you'll get an exception if you try to read or write data with `soma_joinid` outside the range 0 through 2637 inclusive.\n",
+    "\n",
+    "If you have more data -- more cells -- to add to the experiment later, you will be able resize the `obs`, up to the `maxshape` which is a hard limit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "41df0f93-139d-4483-87fe-7b825b7fc550",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "9223372036854773759"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "exp.obs.maxshape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3deadaeb-5ab5-4c9d-ba31-79ec1d36aace",
+   "metadata": {},
+   "source": [
+    "We'll see more about this on experiment-level resizes below, as well as in the tutorial on TileDB-SOMA's append mode."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52dcd26b-1de2-434e-8593-57d5583e4fdc",
+   "metadata": {},
+   "source": [
+    "The `var` dataframe's shape is similar:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3e2bc042-15c7-47ea-b72f-2a79f8f02a58",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[1838, 9223372036854773969]"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "var = exp.ms[\"RNA\"].var\n",
+    "[var.shape, var.maxshape]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22fb5a8f-245e-4b8a-9090-376ba6209dd8",
+   "metadata": {},
+   "source": [
+    "Likewise, the N-dimensional arrays within the experiment have their shapes as well."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "c2054e58-9a35-4185-8b77-62baed4c6e96",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[(2638, 1838), (9223372036854773759, 9223372036854773759)]"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[\n",
+    "    exp.ms[\"RNA\"].X[\"data\"].shape,\n",
+    "    exp.ms[\"RNA\"].X[\"data\"].maxshape,\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "c8949379-5e84-460f-b2b0-d2f4e279b57b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['X_draw_graph_fr', 'X_pca', 'X_tsne', 'X_umap']"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "obsm = exp.ms[\"RNA\"].obsm\n",
+    "list(obsm.keys())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "82b16ded-298c-4d7e-8dfd-ffb4b36c37c6",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['connectivities', 'distances']"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "obsp = exp.ms[\"RNA\"].obsp\n",
+    "list(obsp.keys())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "bbe56e08-c237-48df-8b0d-393057e7e6fa",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[(2638, 50), (9223372036854773759, 9223372036854773759)]"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[\n",
+    "    obsm[\"X_pca\"].shape,\n",
+    "    obsm[\"X_pca\"].maxshape,\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "7577221c-85c7-4549-847d-8cbcc1b771ab",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[(2638, 2638), (9223372036854773759, 9223372036854773759)]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[\n",
+    "    obsp[\"distances\"].shape,\n",
+    "    obsp[\"distances\"].maxshape,\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7c0e4bb-30fd-4b24-9231-e72c79d0a1c2",
+   "metadata": {},
+   "source": [
+    "In particular, the `X` array in this experiment -- and in most experiments -- is _sparse_. That means there needn't be a number in every row or cell of the matrix. Nonetheless, the shape serves as a soft limit for reads and writes: you'll get an exception trying to read or write outside of these."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aecfff79-d5ac-4361-ba56-0c5cad05206d",
+   "metadata": {},
+   "source": [
+    "As a convenience, you can see all the experiment's objects' shapes at once as follows:\n",
+    "\n",
+    "```\n",
+    "import tiledbsoma.io\n",
+    "tiledbsoma.io.show_experiment_shapes(exp.uri)\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0836f330-6cfd-4d88-9779-ce48d1e90e82",
+   "metadata": {},
+   "source": [
+    "As with AnnData, as a general rule you'll see the following:\n",
+    "\n",
+    "* An `X` array's `shape` is `nobs` x `nvar`\n",
+    "* An `obsm` array's shape is `nobs` x some number, maybe 50\n",
+    "* An `obsp` array's shape is `nobs` x `nobs`\n",
+    "* A `varm` array's shape is `var` x some number, maybe 50\n",
+    "* A `varp` array's shape is `nvar` x `nvar`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c33a9424-9515-4f9b-b191-f50cca39dec2",
+   "metadata": {},
+   "source": [
+    "## When and how to resize at the experiment level"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44df2aea-8480-430f-adf9-eeff960a562f",
+   "metadata": {},
+   "source": [
+    "The primary reason you'd resize a dataframe or an array within an experiment is to append more data. For example, say you have an experiment with the results of Monday's lab run on a sample of 100,000 cells. Then maybe on Tuesday you'll want to add that day's lab run of an additional 70,000 cells to the same experiment.\n",
+    "\n",
+    "Because the shapes are soft limits, reading or writing beyond which will result in an exception, you'd need to resize the experiment to accommodate new shapes for the dataframes and arrays in the experiment to allow for new `nobs` = 170,000.\n",
+    "\n",
+    "Please see the append-mode tutorial for how to do that using `tiledbsoma.io.register_anndatas` and `tiledbsoma.io.resize_experiment`.\n",
+    "\n",
+    "While you can resize each dataframe and array in the experiment one at a time -- see \"Advanced usage\", below in this notebook -- by var the most common case is `tiledbsoma.io.resize_experiment`, which exists to make this simple and convenient."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b50cd522-ded1-4dd8-86ec-ea7c7e8f5421",
+   "metadata": {},
+   "source": [
+    "## How to upgrade older experiments"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a397a2ff-5e9d-470f-a5e3-c0dd2fb6d731",
+   "metadata": {},
+   "source": [
+    "Experiments created by TileDB-SOMA 1.15 and higher will look as shown above. Let's take a look at an experiment from before TileDB-SOMA 1.15."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "6bce4d88-84ef-4f13-8441-473bbceb8292",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tiledbsoma.io\n",
+    "\n",
+    "import tempfile\n",
+    "import shutil\n",
+    "\n",
+    "olduri = tempfile.mktemp()\n",
+    "shutil.copytree('data/sparse/pbmc3k-pre-1.15', olduri)\n",
+    "expold = tiledbsoma.open(olduri)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aa7fa07-aa0d-4e8e-b739-fb60d77cd971",
+   "metadata": {},
+   "source": [
+    "This is the same PBMC3K data as above. Compare the old and new shapes:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "937e19c0-f425-46ce-8517-b5a3812a2afb",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[9223372036854773759, 9223372036854773759, False]"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[ expold.obs.shape, expold.obs.maxshape, expold.obs.tiledbsoma_has_upgraded_domain ]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "2c690a8d-3654-430a-ba15-eb65dfbf789b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[(9223372036854773759, 9223372036854773759),\n",
+       " (9223372036854773759, 9223372036854773759),\n",
+       " False]"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[ expold.ms[\"RNA\"].X[\"data\"].shape, expold.ms[\"RNA\"].X[\"data\"].maxshape, expold.ms[\"RNA\"].X[\"data\"].tiledbsoma_has_upgraded_shape ]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "72358898-f0e3-46cd-84ad-26077a3ef8f5",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[2638, 9223372036854773759, True]"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[ exp.obs.shape, exp.obs.maxshape, exp.obs.tiledbsoma_has_upgraded_domain ]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "2d8b50c6-8b3b-45ad-a595-a5fd1f10d6c0",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[(2638, 1838), (9223372036854773759, 9223372036854773759), True]"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[ exp.ms[\"RNA\"].X[\"data\"].shape, exp.ms[\"RNA\"].X[\"data\"].maxshape, exp.ms[\"RNA\"].X[\"data\"].tiledbsoma_has_upgraded_shape ]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5307b6d8-3bee-48f2-84b4-0d4346eaf50f",
+   "metadata": {},
+   "source": [
+    "Note that for the pre-1.15 experiment, the `shape` is huge -- like the `maxshape` -- and `tiledbsoma_has_upgraded_domain` is False.\n",
+    "\n",
+    "To make the old experiment look like the new experiment, simply call"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "012d726c-37f7-48a7-895c-8a69a2df2323",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "tiledbsoma.io.upgrade_experiment_shapes(expold.uri)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "3d1387ef-1738-420e-861e-6676afba58a3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "expold = tiledbsoma.open(olduri)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "d8a98c8d-6446-4a9f-91f5-9024e18b56da",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[(2638, 1838), (9223372036854773759, 9223372036854773759), True]"
+      ]
+     },
+     "execution_count": 19,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[ expold.ms[\"RNA\"].X[\"data\"].shape, expold.ms[\"RNA\"].X[\"data\"].maxshape, expold.ms[\"RNA\"].X[\"data\"].tiledbsoma_has_upgraded_shape ]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b42e49a-96d5-494b-80bd-3cf0816e5b38",
+   "metadata": {},
+   "source": [
+    "Additionally, you can call `tiledbsoma.io.show_experiment_shapes(expold.uri)` before and after doing the upgrade.\n",
+    "\n",
+    "To run a pre-check, you can do\n",
+    "\n",
+    "```\n",
+    "tiledbsoma.io.upgrade_experiment_shapes(expold.uri, check_only=True)\n",
+    "```\n",
+    "\n",
+    "This won't change anything -- it'll simply tell you if the operation will be possible."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7d48ee7-7461-4370-95e1-00c12c3aa80b",
+   "metadata": {},
+   "source": [
+    "## Advanced usage: dataframes with non-standard index columns"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2e69ef4-d9a7-4cc9-b6d0-9b55f41ae838",
+   "metadata": {},
+   "source": [
+    "In the [SOMA data model](https://github.com/single-cell-data/SOMA/blob/main/abstract_specification.md), the `SparseNDArray` and `DenseNDArray` objects always have int64 dimensions named `soma_dim_0`, `soma_dim_1`, and up, and they have a numeric `soma_data` attribute for the contents of the array. Furthermore, this is always the case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "7d65d543-c98c-47c9-8a73-177b8e254c51",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "soma_dim_0: int64 not null\n",
+       "soma_dim_1: int64 not null\n",
+       "soma_data: float not null"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "exp.ms[\"RNA\"].X[\"data\"].schema"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c31bc6fd-6091-4a07-bee8-df73aa2ecfb2",
+   "metadata": {},
+   "source": [
+    "For dataframes, though, while there must be a `soma_joinid` column of type int64, you can have one or more other index columns in addtion -- or, `soma_joinid` can be a non-index column.\n",
+    "\n",
+    "This means that in the default, simplest, and most common case, you can think of a dataframe has having a shape just as the N-dimensional arrays do. But really, dataframes are capable of more than that.\n",
+    "\n",
+    "In these cases, `.shape` will only tell you things about the `soma_joinid` column. Here's where `domain` and `maxdomain` come in."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "453b64d0-9a6a-4b00-80f0-38259fa1ffa4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sdfuri1 = tempfile.mktemp()\n",
+    "sdfuri2 = tempfile.mktemp()\n",
+    "sdfuri3 = tempfile.mktemp()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "d93f35b4-72de-4895-aeaa-a769555de7b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pyarrow as pa\n",
+    "\n",
+    "schema = pa.schema([\n",
+    "    (\"soma_joinid\", pa.int64()),\n",
+    "    (\"mystring\", pa.string()),\n",
+    "    (\"myint\", pa.int32()),\n",
+    "    (\"myfloat\", pa.float32()),\n",
+    "])\n",
+    "\n",
+    "data = pa.Table.from_pydict({\n",
+    "    \"soma_joinid\": [0, 1],\n",
+    "    \"mystring\": [\"hello\", \"world\"],\n",
+    "    \"myint\": [33, 44],\n",
+    "    \"myfloat\": [4.5, 5.5],\n",
+    "})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "4c87ecfd-de46-47f1-bef0-d85cd98ceaa8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tiledbsoma.DataFrame.create(\n",
+    "    sdfuri1,\n",
+    "    schema=schema,\n",
+    "    index_column_names=[\"soma_joinid\"],\n",
+    "    # Low and high soft limits for soma_joinid:\n",
+    "    domain=[(0, 9)],\n",
+    ") as sdf1:\n",
+    "        sdf1.write(data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "86a09444-0ea6-4359-bf96-871832bb3878",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tiledbsoma.DataFrame.create(\n",
+    "    sdfuri2,\n",
+    "    schema=schema,\n",
+    "    index_column_names=[\"soma_joinid\", \"mystring\"],\n",
+    "    # Low and high soft limits for soma_joinid and mystring:\n",
+    "    #\n",
+    "    # Note for string index columns: you cannot set low or high values --\n",
+    "    # please say None, or (\"\", \"\").\n",
+    "    domain=[(0, 9), None],\n",
+    ") as sdf2:\n",
+    "        sdf2.write(data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "925b57a6-17f2-4fe0-a6a1-2d2937721b42",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tiledbsoma.DataFrame.create(\n",
+    "    sdfuri3,\n",
+    "    schema=schema,\n",
+    "    index_column_names=[\"myfloat\", \"myint\"],\n",
+    "    # Low and high soft limits for myfloat and myint:\n",
+    "    domain=[(0, 999), (-1000, 1000)],\n",
+    ") as sdf3:\n",
+    "        sdf3.write(data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef786e7e-ee2f-4c0f-8173-a4dec2aacd15",
+   "metadata": {},
+   "source": [
+    "Now let's look at the `domain` and `maxdomain` for these dataframes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "48542335-b585-44a2-acdd-d76a2a236c9d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "--------------------------------------------------------------------------------\n",
+      "URI: /var/folders/7l/_wsjyk5d4p3dz3kbz7wxn7t00000gn/T/tmp_mtpj_0e\n",
+      "\n",
+      "Domain low/high pairs are as specified at create.\n",
+      "Notice that .shape is nothing more than the soma_joinid column's high value, plus one.\n",
+      "If the dataframe's soma_joinid column is not an index column, .shape is None.\n",
+      "\n",
+      "domain:   ((0, 9),)\n",
+      "shape:    10\n",
+      "\n",
+      "Maxdomain is the hard limit for resize.\n",
+      "As with domain, we see ('', '') for string types.\n",
+      "As with domain, if the dataframe's soma_joinid column is not an index column, .maxshape is None.\n",
+      "\n",
+      "maxdomain: ((0, 9223372036854775796),)\n",
+      "maxshape:  9223372036854775797\n",
+      "\n",
+      "--------------------------------------------------------------------------------\n",
+      "URI: /var/folders/7l/_wsjyk5d4p3dz3kbz7wxn7t00000gn/T/tmpwvm20hn_\n",
+      "\n",
+      "Domain low/high pairs are as specified at create.\n",
+      "Notice that .shape is nothing more than the soma_joinid column's high value, plus one.\n",
+      "If the dataframe's soma_joinid column is not an index column, .shape is None.\n",
+      "\n",
+      "domain:   ((0, 9), ('', ''))\n",
+      "shape:    10\n",
+      "\n",
+      "Maxdomain is the hard limit for resize.\n",
+      "As with domain, we see ('', '') for string types.\n",
+      "As with domain, if the dataframe's soma_joinid column is not an index column, .maxshape is None.\n",
+      "\n",
+      "maxdomain: ((0, 9223372036854775796), ('', ''))\n",
+      "maxshape:  9223372036854775797\n",
+      "\n",
+      "--------------------------------------------------------------------------------\n",
+      "URI: /var/folders/7l/_wsjyk5d4p3dz3kbz7wxn7t00000gn/T/tmp_q2pjlf0\n",
+      "\n",
+      "Domain low/high pairs are as specified at create.\n",
+      "Notice that .shape is nothing more than the soma_joinid column's high value, plus one.\n",
+      "If the dataframe's soma_joinid column is not an index column, .shape is None.\n",
+      "\n",
+      "domain:   ((0.0, 999.0), (-1000, 1000))\n",
+      "shape:    None\n",
+      "\n",
+      "Maxdomain is the hard limit for resize.\n",
+      "As with domain, we see ('', '') for string types.\n",
+      "As with domain, if the dataframe's soma_joinid column is not an index column, .maxshape is None.\n",
+      "\n",
+      "maxdomain: ((-3.4028234663852886e+38, 3.4028234663852886e+38), (-2147483648, 2147481645))\n",
+      "maxshape:  None\n"
+     ]
+    }
+   ],
+   "source": [
+    "for sdfuri in [sdfuri1, sdfuri2, sdfuri3]:\n",
+    "    with tiledbsoma.DataFrame.open(sdfuri) as sdf:\n",
+    "        print()\n",
+    "        print(\"-\" * 80)\n",
+    "        print(\"URI:\", sdfuri)\n",
+    "        print()\n",
+    "        print(\"Domain low/high pairs are as specified at create.\")\n",
+    "        print(\"Notice that .shape is nothing more than the soma_joinid column's high value, plus one.\")\n",
+    "        print(\"If the dataframe's soma_joinid column is not an index column, .shape is None.\")\n",
+    "        print()\n",
+    "        print(\"domain:  \", sdf.domain)\n",
+    "        print(\"shape:   \", sdf.shape)\n",
+    "        print()\n",
+    "        print(\"Maxdomain is the hard limit for resize.\")\n",
+    "        print(\"As with domain, we see ('', '') for string types.\")\n",
+    "        print(\"As with domain, if the dataframe's soma_joinid column is not an index column, .maxshape is None.\")\n",
+    "        print()\n",
+    "        print(\"maxdomain:\", sdf.maxdomain)\n",
+    "        print(\"maxshape: \", sdf.maxshape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed5477cd-35cb-4b4b-8a99-13cdc71149f0",
+   "metadata": {},
+   "source": [
+    "## Advanced usage: using resize at the dataframe/array level using the SOMA API"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "900600c1-b240-4dbb-826c-20ade016b9a6",
+   "metadata": {},
+   "source": [
+    "For N-dimensional arrays that have been upgraded, or that were created using TileDB-SOMA 1.15 or higher, simply do the following:\n",
+    "\n",
+    "* If the array's `.tiledbsoma_has_upgraded_shape` reports False, invoke the `.tiledbsoma_upgrade_shape` method.\n",
+    "* Otherwise invoke the `.resize` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "id": "d761f9cc-ff77-4284-b055-ec6be7d2c72c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tempuri = tempfile.mktemp()\n",
+    "shutil.copytree(\"data/sparse/pbmc3k\", tempuri)\n",
+    "\n",
+    "exp = tiledbsoma.Experiment.open(tempuri)\n",
+    "X = exp.ms[\"RNA\"].X[\"data\"]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "id": "da05292d-ae6f-4a72-80a3-0887607bf560",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 57,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X.tiledbsoma_has_upgraded_shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "id": "d781b504-f7eb-4d80-88fd-a8e18dea179d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(2638, 1838)"
+      ]
+     },
+     "execution_count": 58,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "id": "021de71e-bf90-4ecd-a08c-eeb7973395a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with tiledbsoma.Experiment.open(tempuri, \"w\") as exp:\n",
+    "    exp.ms[\"RNA\"].X[\"data\"].resize([7200, 1848])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "id": "70ccb11b-7fcd-463b-84b1-1db6d2a636fc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(7200, 1848)\n"
+     ]
+    }
+   ],
+   "source": [
+    "with tiledbsoma.Experiment.open(tempuri, \"w\") as exp:\n",
+    "    X = exp.ms[\"RNA\"].X[\"data\"]\n",
+    "    print(X.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddf99890-a8d6-4e9f-b974-c03f6b9015dc",
+   "metadata": {},
+   "source": [
+    "For dataframes, the process is similar. If you want to expand only the soft limits for `soma_joinid`, you can use some simpler methods:\n",
+    "\n",
+    "* If the dataframe's `tiledbsoma_has_upgraded_domain` reports False, invoke `.tiledbsoma_upgrade_soma_joinid_shape`\n",
+    "* Otherwise invoke the `.tiledbsoma_resize_soma_joinid_shape` method.\n",
+    "\n",
+    "If you have non-standard dataframes where `soma_joinid` is not the only index column, or is not an index column at all, then:\n",
+    "\n",
+    "* If the dataframe's `tiledbsoma_has_upgraded_domain` reports False, invoke `.tiledbsoma_upgrade_domain`\n",
+    "* Otherwise invoke the `.change_domain` method.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0362cafc-50ad-467a-ad2a-9a9bdb9de4a3",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

	soma_joinid	obs_id	n_genes	percent_mito	n_counts	louvain
0	0	AAACATACAACCAC-1	781	0.030178	2419.0	CD4 T cells
1	1	AAACATTGAGCTAC-1	1352	0.037936	4903.0	B cells
2	2	AAACATTGATCAGC-1	1131	0.008897	3147.0	CD4 T cells
3	3	AAACCGTGCTTCCG-1	960	0.017431	2639.0	CD14+ Monocytes
4	4	AAACCGTGTATGCG-1	522	0.012245	980.0	NK cells
...	...	...	...	...	...	...
2633	2633	TTTCGAACTCTCAT-1	1155	0.021104	3459.0	CD14+ Monocytes
2634	2634	TTTCTACTGAGGCA-1	1227	0.009294	3443.0	B cells
2635	2635	TTTCTACTTCCTCG-1	622	0.021971	1684.0	B cells
2636	2636	TTTGCATGAGAGGC-1	454	0.020548	1022.0	B cells
2637	2637	TTTGCATGCCTCAC-1	724	0.008065	1984.0	CD4 T cells