diff --git a/docs/index.md b/docs/index.md index 200a77b..3a54557 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,14 +1,15 @@ # scPRINT: Large Cell Model for scRNAseq data +[![codecov](https://codecov.io/gh/jkobject/scPRINT/branch/main/graph/badge.svg?token=GRnnData_token_here)](https://codecov.io/gh/jkobject/scPRINT) +[![CI](https://github.com/jkobject/scPRINT/actions/workflows/main.yml/badge.svg)](https://github.com/jkobject/scPRINT/actions/workflows/main.yml) [![PyPI version](https://badge.fury.io/py/scprint.svg)](https://badge.fury.io/py/scprint) -[![Documentation Status](https://readthedocs.org/projects/scprint/badge/?version=latest)](https://scprint.readthedocs.io/en/latest/?badge=latest) [![Downloads](https://pepy.tech/badge/scprint)](https://pepy.tech/project/scprint) [![Downloads](https://pepy.tech/badge/scprint/month)](https://pepy.tech/project/scprint) [![Downloads](https://pepy.tech/badge/scprint/week)](https://pepy.tech/project/scprint) [![GitHub issues](https://img.shields.io/github/issues/jkobject/scPRINT)](https://img.shields.io/github/issues/jkobject/scPRINT) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) -[![DOI](https://zenodo.org/badge/391909874.svg)]() +[![DOI](https://zenodo.org/badge/391909874.svg)](https://doi.org/10.1101/2024.07.29.605556) ![logo](logo.png) @@ -23,39 +24,122 @@ scPRINT can be used to perform the following analyses: - __label prediction__: predict the cell type, disease, sequencer, sex, and ethnicity of your cells - __gene network inference__: generate a gene network from any cell or cell cluster in your scRNAseq dataset -[Read the paper!](https://www.biorxiv.org/content/10.1101/2024.07.29.605556v1) if you would like to know more about scPRINT. +[Read the manuscript!](https://www.biorxiv.org/content/10.1101/2024.07.29.605556v1) if you would like to know more about scPRINT. Have a look at some of my [X-plainers](https://twitter.com/jkobject). ![figure1](figure1.png) +## Table of Contents + +- [scPRINT: Large Cell Model for scRNAseq data](#scprint-large-cell-model-for-scrnaseq-data) + - [Table of Contents](#table-of-contents) + - [Install `scPRINT`](#install-scprint) + - [lamin.ai](#laminai) + - [install](#install) + - [pytorch and GPUs](#pytorch-and-gpus) + - [dev install](#dev-install) + - [Usage](#usage) + - [scPRINT's basic commands](#scprints-basic-commands) + - [Notes on GPU/CPU usage with triton](#notes-on-gpucpu-usage-with-triton) + - [Simple tests:](#simple-tests) + - [FAQ](#faq) + - [I want to generate gene networks from scRNAseq data:](#i-want-to-generate-gene-networks-from-scrnaseq-data) + - [I want to generate cell embeddings and cell label predictions from scRNAseq data:](#i-want-to-generate-cell-embeddings-and-cell-label-predictions-from-scrnaseq-data) + - [I want to denoise my scRNAseq dataset:](#i-want-to-denoise-my-scrnaseq-dataset) + - [I want to generate an atlas-level embedding](#i-want-to-generate-an-atlas-level-embedding) + - [I need to generate gene tokens using pLLMs](#i-need-to-generate-gene-tokens-using-pllms) + - [I want to pre-train scPRINT from scratch on my own data](#i-want-to-pre-train-scprint-from-scratch-on-my-own-data) + - [how can I find if scPRINT was trained on my data?](#how-can-i-find-if-scprint-was-trained-on-my-data) + - [can I use scPRINT on other organisms rather than human?](#can-i-use-scprint-on-other-organisms-rather-than-human) + - [how long does scPRINT takes? what kind of resources do I need? (or in alternative: can i run scPRINT locally?)](#how-long-does-scprint-takes-what-kind-of-resources-do-i-need-or-in-alternative-can-i-run-scprint-locally) + - [I have different scRNASeq batches. Should I integrate my data before running scPRINT?](#i-have-different-scrnaseq-batches-should-i-integrate-my-data-before-running-scprint) + - [where to find the gene embeddings?](#where-to-find-the-gene-embeddings) + - [Documentation](#documentation) + - [Model Weights](#model-weights) + - [Development](#development) + - [Work in progress (PR welcomed):](#work-in-progress-pr-welcomed) + ## Install `scPRINT` -For the moment scPRINT has been tested on MacOS and Linux (Ubuntu 20.04) with Python 3.10. +For the moment scPRINT has been tested on MacOS and Linux (Ubuntu 20.04) with Python 3.10. Its instalation takes on average 10 minutes. If you want to be using flashattention2, know that it only supports triton 2.0 MLIR's version and torch==2.0.0 for now. -```python -conda create -n "[whatever]" python==3.10 +### lamin.ai + +To use scPRINT, I need you to use lamin.ai. This is needed to load biological informations like genes, cell types, organisms etc... + +To do so, you will need to connect with google or github to [lamin.ai](https://lamin.ai/login), then be sure to connect before running anything (or before starting a notebook): `lamin login --key `. Follow the instructions on [their website](https://docs.lamin.ai/guide). + +### install + +To start you will need to do: + +```bash +conda create -n python==3.10 #scprint might work with python >3.10, but it is not tested #one of pip install scprint # OR -pip install scprint[dev] # for the dev dependencies (building etc..) AND/OR [dev,flash] -pip install scprint[flash] && pip install -e "git+https:/ -/github.com/triton-lang/triton.git@legacy-backend -#egg=triton&subdirectory=python" # to use flashattention2, you will need to install triton 2.0.0.dev20221202 specifically, working on removing this dependency # only if you have a compatible gpu (e.g. not available for apple GPUs for now, see https://github.com/triton-lang/triton?tab=readme-ov-file#compatibility) +pip install scprint[dev] # for the dev dependencies (building etc..) OR +pip install scprint[flash] # to use flashattention2 with triton: only if you have a compatible gpu (e.g. not available for apple GPUs for now, see https://github.com/triton-lang/triton?tab=readme-ov-file#compatibility) +#OR pip install scPRINT[dev,flash] + +lamin login --key +lamin init --storage --schema bionty +``` + +if you start with lamin and had to do a `lamin init`, you will also need to populate your ontologies. This is because scPRINT is using ontologies to define its cell types, diseases, sexes, ethnicities, etc. + +you can do it manually or with our function: + +```python +from scdataloader.utils import populate_my_ontology + +populate_my_ontology() #to populate everything (recommended) (can take 2-10mns) + +populate_my_ontology( #the minimum for scprint to run some inferences (denoising, grn inference) +organisms: List[str] = ["NCBITaxon:10090", "NCBITaxon:9606"], + sex: List[str] = ["PATO:0000384", "PATO:0000383"], + celltypes = None, + ethnicities = None, + assays = None, + tissues = None, + diseases = None, + dev_stages = None, +) ``` We make use of some additional packages we developed alongside scPRint. + Please refer to their documentation for more information: - [scDataLoader](https://github.com/jkobject/scDataLoader): a dataloader for training large cell models. - [GRnnData](https://github.com/cantinilab/GRnnData): a package to work with gene networks from single cell data. - [benGRN](https://github.com/jkobject/benGRN): a package to benchmark gene network inference methods from single cell data. -### lamin.ai +### pytorch and GPUs + +scPRINT can run on machines without GPUs, but it will be slow. It is highly recommended to use a GPU for inference. + +Once you have a GPU, and installed the required drivers, you might need to install a specific version of pytorch that is compatible with your drivers (e.g. nvidia 550 drivers will lead to a nvidia toolkit 11.7 or 11.8 which might mean you need to re-install a different flavor of pytorch for things to work. e.g. using the command: +`pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118` on my case on linux + ). + +I was able to test it with nvidia 11.7, 11.8, 12.2. -⚠️ if you want to use the scDataloader's multi-dataset mode or if you want to preprocess datasets and other functions of the model, you will need to use lamin.ai. +### dev install -In that case, connect with google or github to [lamin.ai](https://lamin.ai/login), then be sure to connect before running anything (or before starting a notebook): `lamin login --key `. Follow the instructions on [their website](https://docs.lamin.ai/guide). +If you want to use the latest version of scPRINT and work on the code yourself use `git clone` and `pip -e` instead of `pip install`. + +```bash +git clone https://github.com/jkobject/scPRINT +git clone https://github.com/jkobject/scDataLoader +git clone https://github.com/cantinilab/GRnnData +git clone https://github.com/jkobject/benGRN +pip install -e scPRINT[dev] +pip install -e scDataLoader[dev] +pip install -e GRnnData[dev] +pip install -e benGRN[dev] +``` ## Usage @@ -88,7 +172,7 @@ $ scprint fit/train/predict/test/denoise/embed/gninfer --config config/[medium|l find out more about the commands by running `scprint --help` or `scprint [command] --help`. -more examples of using the command line are available in the [docs](./docs/usage.md). +more examples of using the command line are available in the [docs](usage.md). ### Notes on GPU/CPU usage with triton @@ -102,6 +186,10 @@ model = scPrint.load_from_checkpoint( transformer="normal") ``` +### Simple tests: + +An instalation of scPRINT and a simple test of the denoiser is performed during each commit to the main branch with a [Github action](https://github.com/jkobject/scPRINT/actions) and [pytest workflow](https://github.com/jkobject/scPRINT/blob/main/.github/workflows/main.yml). It also provides an expected runtime for the installation and run of scPRINT. + We now explore the different usages of scPRINT: ## FAQ @@ -110,27 +198,27 @@ We now explore the different usages of scPRINT: -> Refer to the section . gene network inference in [this notebook](./notebooks/cancer_usecase.ipynb#). --> More examples in this notebook [notebooks/assessments/bench_omni.ipynb](../notebooks/bench_omni.ipynb). +-> More examples in this notebook [./notebooks/assessments/bench_omni.ipynb](https://github.com/jkobject/scPRINT/blob/main/notebooks/bench_omni.ipynb). ### I want to generate cell embeddings and cell label predictions from scRNAseq data: -> Refer to the embeddings and cell annotations section in [this notebook](./notebooks/cancer_usecase.ipynb#). -### I want to denoising my scRNAseq dataset: +### I want to denoise my scRNAseq dataset: -> Refer to the Denoising of B-cell section in [this notebook](./notebooks/cancer_usecase.ipynb). --> More example in our benchmark notebook [notebooks/assessments/bench_denoising.ipynb](../notebooks/bench_denoising.ipynb). +-> More example in our benchmark notebook [./notebooks/assessments/bench_denoising.ipynb](https://github.com/jkobject/scPRINT/blob/main/notebooks/bench_denoising.ipynb). ### I want to generate an atlas-level embedding --> Refer to the notebook [figures/nice_umap.ipynb](../figures/nice_umap.ipynb). +-> Refer to the notebook [nice_umap.ipynb](https://github.com/jkobject/scPRINT/blob/main/figures/nice_umap.ipynb). ### I need to generate gene tokens using pLLMs To run scPRINT, you can use the option to define the gene tokens using protein language model embeddings of genes. This is done by providing the path to a parquet file of the precomputed set of embeddings for each gene name to scPRINT via "precpt_gene_emb" --> To generate this file please refer to the notebook [notebooks/generate_gene_embeddings.ipynb](../notebooks/generate_gene_embeddings.ipynb). +-> To generate this file please refer to the notebook [generate_gene_embeddings](https://github.com/jkobject/scPRINT/blob/main/notebooks/generate_gene_embeddings.ipynb). ### I want to pre-train scPRINT from scratch on my own data @@ -163,7 +251,7 @@ model = scPrint.load_from_checkpoint( ) ``` -You can also recreate the gene embedding file through [this notebook](notebooks/generate_gene_embeddings.ipynb). Just call the functions, and it should recreate the file itself. +You can also recreate the gene embedding file through [this notebook](https://github.com/jkobject/scPRINT/blob/main/notebooks/generate_gene_embeddings.ipynb). Just call the functions, and it should recreate the file itself. the file itself is also available on [hugging face](https://huggingface.co/jkobject/scPRINT/tree/main) @@ -177,21 +265,23 @@ Model weights are available on [hugging face](https://huggingface.co/jkobject/sc ## Development -Read the [CONTRIBUTING.md](CONTRIBUTING.md) file. +Read the [CONTRIBUTING.md](https://github.com/jkobject/scPRINT/blob/main/CONTRIBUTING.md) file. Read the [training runs](https://wandb.ai/ml4ig/scprint_scale/reports/scPRINT-trainings--Vmlldzo4ODIxMjgx?accessToken=80metwx7b08hhourotpskdyaxiflq700xzmzymr6scvkp69agybt79l341tv68hp) document to know more about how pre-training was performed and the its behavior. +code coverage is not right as I am using the command line interface for now. >50% of the code is covered by my current unit test. + Acknowledgement: [python template](https://github.com/rochacbruno/python-project-template) [laminDB](https://lamin.ai/) [lightning](https://lightning.ai/) -## Work in progress: +## Work in progress (PR welcomed): 1. remove the triton dependencies 2. add version with additional labels (tissues, age) and organisms (mouse, zebrafish) and more datasets from cellxgene 3. version with separate transformer blocks for the encoding part of the bottleneck learning and for the cell embeddings 4. improve classifier to output uncertainties and topK predictions when unsure -5. +5. setup latest lamindb version Awesome Large Cell Model created by Jeremie Kalfon. diff --git a/notebooks/additional/cells_and_genes_in_cxg.ipynb b/notebooks/additional/cells_and_genes_in_cxg.ipynb new file mode 100644 index 0000000..18d42fa --- /dev/null +++ b/notebooks/additional/cells_and_genes_in_cxg.ipynb @@ -0,0 +1,365 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import cellxgene_census\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The \"stable\" release is currently 2024-07-01. Specify 'census_version=\"2024-07-01\"' in future calls to open_soma() to ensure data consistency.\n" + ] + } + ], + "source": [ + "# or, directly open the census (don't forget to close it!)\n", + "census = cellxgene_census.open_soma()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "var_df = cellxgene_census.get_var(census, \"homo_sapiens\")" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
soma_joinidfeature_idfeature_namefeature_lengthnnzn_measured_obs
00ENSG00000000003TSPAN64530453044873855064
11ENSG00000000005TNMD147623605961201828
22ENSG00000000419DPM192761757646274159149
33ENSG00000000457SCYL36883911732273988868
44ENSG00000000460C1orf1125970628779473636201
.....................
6052560525ENSG00000288718ENSG00000288718.1107041248980
6052660526ENSG00000288719ENSG00000288719.1425228261248980
6052760527ENSG00000288724ENSG00000288724.1625361248980
6052860528ENSG00000290791ENSG00000290791.13612164243485
6052960529ENSG00000290146ENSG00000290146.11292795843485
\n", + "

60530 rows × 6 columns

\n", + "
" + ], + "text/plain": [ + " soma_joinid feature_id feature_name feature_length \\\n", + "0 0 ENSG00000000003 TSPAN6 4530 \n", + "1 1 ENSG00000000005 TNMD 1476 \n", + "2 2 ENSG00000000419 DPM1 9276 \n", + "3 3 ENSG00000000457 SCYL3 6883 \n", + "4 4 ENSG00000000460 C1orf112 5970 \n", + "... ... ... ... ... \n", + "60525 60525 ENSG00000288718 ENSG00000288718.1 1070 \n", + "60526 60526 ENSG00000288719 ENSG00000288719.1 4252 \n", + "60527 60527 ENSG00000288724 ENSG00000288724.1 625 \n", + "60528 60528 ENSG00000290791 ENSG00000290791.1 3612 \n", + "60529 60529 ENSG00000290146 ENSG00000290146.1 1292 \n", + "\n", + " nnz n_measured_obs \n", + "0 4530448 73855064 \n", + "1 236059 61201828 \n", + "2 17576462 74159149 \n", + "3 9117322 73988868 \n", + "4 6287794 73636201 \n", + "... ... ... \n", + "60525 4 1248980 \n", + "60526 2826 1248980 \n", + "60527 36 1248980 \n", + "60528 1642 43485 \n", + "60529 7958 43485 \n", + "\n", + "[60530 rows x 6 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "var_df" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "74322510" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "var_df.n_measured_obs.max()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "20044" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "var_df.feature_name.str.contains(\"ENSG\").sum() # 40k cannonical genes" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "59957" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(var_df.n_measured_obs > 1_000_000).sum() # measured means the dataset has it but doesn't mean it really measured it" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "29079" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(var_df.nnz > 100_000).sum() # very few datasets with ncRNAseq" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "census.close()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with cellxgene_census.open_soma() as census:\n", + " mouse = census[\"census_data\"][\"mus_musculus\"]\n", + " with mouse.axis_query(\n", + " measurement_name=\"RNA\",\n", + " obs_query=soma.AxisQuery(value_filter=\"tissue=='brain' and sex=='male' and is_primary_data==True\"),\n", + " ) as query:\n", + " var_df = query.var().concat().to_pandas().set_index(\"soma_joinid\")\n", + " n_vars = len(var_df)\n", + "\n", + " raw_n = np.zeros((n_vars,), dtype=np.int64) # accumulate number of non-zero X values\n", + " raw_sum = np.zeros((n_vars,), dtype=np.float64) # accumulate the sum of expression\n", + "\n", + " # query.X() returns an iterator of pyarrow.Table, with X data in COO format.\n", + " # You can request an indexer from the query that will map it to positional indices\n", + " indexer = query.indexer\n", + " for arrow_tbl in query.X(\"raw\").tables():\n", + " var_dim = indexer.by_var(arrow_tbl[\"soma_dim_1\"])\n", + " data = arrow_tbl[\"soma_data\"]\n", + " np.add.at(raw_n, var_dim, 1)\n", + " np.add.at(raw_sum, var_dim, data)\n", + "\n", + " with np.errstate(divide=\"ignore\", invalid=\"ignore\"):\n", + " raw_mean = raw_sum / query.n_obs\n", + " raw_mean[np.isnan(raw_mean)] = 0\n", + "\n", + " var_df = var_df.assign(raw_n=pd.Series(data=raw_n, index=var_df.index))\n", + " var_df = var_df.assign(raw_mean=pd.Series(data=raw_mean, index=var_df.index))\n", + "\n", + " display(var_df)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "scprint", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/test.ipynb b/notebooks/additional/test.ipynb similarity index 100% rename from test.ipynb rename to notebooks/additional/test.ipynb diff --git a/notebooks/additional/test_zinb1.ipynb b/notebooks/additional/test_zinb1.ipynb new file mode 100644 index 0000000..41e24b0 --- /dev/null +++ b/notebooks/additional/test_zinb1.ipynb @@ -0,0 +1,188 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import torch.nn.functional as F\n", + "import torch\n", + "from torch import nn, Tensor\n" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "def zinb(\n", + " target: Tensor,\n", + " mu: Tensor,\n", + " theta: Tensor,\n", + " pi: Tensor,\n", + " eps=1e-8,\n", + "):\n", + " \"\"\"\n", + " Computes zero-inflated negative binomial (ZINB) loss.\n", + "\n", + " This function was modified from scvi-tools.\n", + "\n", + " Args:\n", + " target (Tensor): Torch Tensor of ground truth data.\n", + " mu (Tensor): Torch Tensor of means of the negative binomial (must have positive support).\n", + " theta (Tensor): Torch Tensor of inverse dispersion parameter (must have positive support).\n", + " pi (Tensor): Torch Tensor of logits of the dropout parameter (real support).\n", + " eps (float, optional): Numerical stability constant. Defaults to 1e-8.\n", + "\n", + " Returns:\n", + " Tensor: ZINB loss value.\n", + " \"\"\"\n", + " #  uses log(sigmoid(x)) = -softplus(-x)\n", + " softplus_pi = F.softplus(-pi)\n", + " # eps to make it positive support and taking the log\n", + " log_theta_mu_eps = torch.log(theta + mu + eps)\n", + " pi_theta_log = -pi + theta * (torch.log(theta + eps) - log_theta_mu_eps)\n", + "\n", + " case_zero = F.softplus(pi_theta_log) - softplus_pi\n", + " mul_case_zero = torch.mul((target < eps).type(torch.float32), case_zero)\n", + "\n", + " case_non_zero = (\n", + " -softplus_pi\n", + " + pi_theta_log\n", + " + target * (torch.log(mu + eps) - log_theta_mu_eps)\n", + " + torch.lgamma(target + theta)\n", + " - torch.lgamma(theta)\n", + " - torch.lgamma(target + 1)\n", + " )\n", + " mul_case_non_zero = torch.mul((target > eps).type(torch.float32), case_non_zero)\n", + "\n", + " res = mul_case_zero + mul_case_non_zero\n", + " # we want to minize the loss but maximize the log likelyhood\n", + " return -res.mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def zinb_sonnet(\n", + " target: Tensor,\n", + " mu: Tensor,\n", + " theta: Tensor,\n", + " pi: Tensor,\n", + " eps=1e-8,\n", + "):\n", + " \"\"\"\n", + " Computes zero-inflated negative binomial (ZINB) loss updated to improve numerical stability with sonnet\n", + "\n", + " This function is modified to improve numerical stability and avoid using lgamma.\n", + "\n", + " Args:\n", + " target (Tensor): Torch Tensor of ground truth data.\n", + " mu (Tensor): Torch Tensor of means of the negative binomial (must have positive support).\n", + " theta (Tensor): Torch Tensor of inverse dispersion parameter (must have positive support).\n", + " pi (Tensor): Torch Tensor of logits of the dropout parameter (real support).\n", + " eps (float, optional): Numerical stability constant. Defaults to 1e-8.\n", + "\n", + " Returns:\n", + " Tensor: ZINB loss value.\n", + " \"\"\"\n", + " # Compute log(1 - sigmoid(pi)) more accurately using -softplus(pi)\n", + " log_neg_pi = -F.softplus(pi)\n", + " \n", + " # Compute log(theta + mu) more accurately\n", + " log_theta_mu = torch.log(theta + mu + eps)\n", + " \n", + " # Compute log(1 + mu/theta) more accurately\n", + " log_1_plus_mu_theta = F.softplus(torch.log(mu + eps) - torch.log(theta + eps))\n", + " \n", + " # Compute log likelihood for zero values\n", + " ll_zero = F.softplus(theta * (torch.log(theta + eps) - log_theta_mu) - pi)\n", + " \n", + " # Compute log likelihood for non-zero values\n", + " ll_non_zero = (\n", + " log_neg_pi\n", + " + theta * torch.log(theta + eps)\n", + " - (theta + target) * log_theta_mu\n", + " + target * torch.log(mu + eps)\n", + " - torch.lgamma(target + 1)\n", + " + torch.lgamma(theta + target)\n", + " - torch.lgamma(theta)\n", + " )\n", + " \n", + " # Combine zero and non-zero cases\n", + " ll = torch.where(target < eps, ll_zero, ll_non_zero)\n", + " \n", + " # Return negative mean log-likelihood\n", + " return -ll.mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Original ZINB Loss: 1.609736680984497\n", + "Original ZINB Loss: 1.606806993484497\n", + "New ZINB Loss: 1.607421875\n" + ] + } + ], + "source": [ + "# Test both functions with the same input\n", + "THETA = 10000 # above this, nothing changes\n", + "\n", + "TARGET = [100,10,10,1,1,0,0,0]\n", + "MINPI = 0.01\n", + "MAXPI = 100\n", + "ERROR = [1,0.1,0.1,0,0,100,100,100]\n", + "\n", + "target = torch.Tensor(TARGET)\n", + "mu = torch.Tensor(TARGET)\n", + "theta = torch.Tensor([THETA]*len(TARGET))\n", + "pi = torch.Tensor([MINPI,MINPI,MINPI,MINPI,MINPI,MAXPI,MAXPI,MAXPI])\n", + "\n", + "# Test original zinb function\n", + "original_loss = zinb(target, mu, theta, pi)\n", + "print(f\"Original ZINB Loss: {original_loss.item()}\")\n", + "\n", + "# Test original zinb function with error\n", + "original_loss = zinb(target, mu+torch.Tensor(ERROR), theta, pi)\n", + "print(f\"Original ZINB Loss: {original_loss.item()}\")\n", + "\n", + "# Test updated zinb_sonnet function\n", + "new_loss = zinb_sonnet(target, mu, theta, pi)\n", + "print(f\"New ZINB Loss: {new_loss.item()}\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "scprint", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/additional/update_lamin_or_cellxgene.ipynb b/notebooks/additional/update_lamin_or_cellxgene.ipynb new file mode 100644 index 0000000..8f06575 --- /dev/null +++ b/notebooks/additional/update_lamin_or_cellxgene.ipynb @@ -0,0 +1,1175 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# update lamindb and bionty..." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[92m→\u001b[0m connected lamindb: jkobject/scprint\n", + "\u001b[0m" + ] + } + ], + "source": [ + "! lamin load scprint\n", + "# ! lamin migrate deploy" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# we will check it works and reset the bionty sources" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The autoreload extension is already loaded. To reload it, use:\n", + " %reload_ext autoreload\n" + ] + } + ], + "source": [ + "import lamindb as ln\n", + "import bionty as bt\n", + "from scdataloader.utils import populate_my_ontology\n", + "%load_ext autoreload\n", + "%autoreload 2\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "937" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(ln.Artifact.filter())" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "bt.base.reset_sources()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[93m!\u001b[0m please reload your instance to reflect the updates!\n" + ] + } + ], + "source": [ + "bt.core.sync_all_sources_to_latest()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# load them in my personnal ontology" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namedefinitionsynonymsparents
ontology_id
CL:0000000cellA Material Entity Of Anatomical Origin (Part O...None[]
CL:0000001primary cultured cellA Cultured Cell That Is Freshly Isolated From ...primary cell culture cell|unpassaged cultured ...[CL:0000010]
CL:0000002obsolete immortal cell line cellObsolete: A Cell Line Cell That Is Expected To...permanent cell line cell|continuous cell line ...[]
CL:0000003obsolete native cellObsolete. A Cell That Is Found In A Natural Se...None[]
CL:0000004obsolete cell by organismObsolete: A Classification Of Cells By The Org...None[]
...............
CL:4042007protoplasmic astrocyteAn Astrocyte With Highly Branched Protrusions,...None[CL:0000029, CL:0002605, CL:0010012]
CL:4042008fibrous astrocyteA Cell Type Located In The First Layer Of The ...None[CL:0000029, CL:2000029, CL:0000127]
CL:4042009interlaminar astrocyteAn Astrocyte Type That Presents Radial Protrus...None[CL:0000029, CL:0002605, CL:0010012]
CL:4042010pial interlaminar astrocyteAn Interlaminar Astrocyte Whose Soma Is Part O...None[CL:4042009]
CL:4042011subpial interlaminar astrocyteAn Interlaminar Astrocyte Type Whose Soma Is P...None[CL:4042009]
\n", + "

2931 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " name \\\n", + "ontology_id \n", + "CL:0000000 cell \n", + "CL:0000001 primary cultured cell \n", + "CL:0000002 obsolete immortal cell line cell \n", + "CL:0000003 obsolete native cell \n", + "CL:0000004 obsolete cell by organism \n", + "... ... \n", + "CL:4042007 protoplasmic astrocyte \n", + "CL:4042008 fibrous astrocyte \n", + "CL:4042009 interlaminar astrocyte \n", + "CL:4042010 pial interlaminar astrocyte \n", + "CL:4042011 subpial interlaminar astrocyte \n", + "\n", + " definition \\\n", + "ontology_id \n", + "CL:0000000 A Material Entity Of Anatomical Origin (Part O... \n", + "CL:0000001 A Cultured Cell That Is Freshly Isolated From ... \n", + "CL:0000002 Obsolete: A Cell Line Cell That Is Expected To... \n", + "CL:0000003 Obsolete. A Cell That Is Found In A Natural Se... \n", + "CL:0000004 Obsolete: A Classification Of Cells By The Org... \n", + "... ... \n", + "CL:4042007 An Astrocyte With Highly Branched Protrusions,... \n", + "CL:4042008 A Cell Type Located In The First Layer Of The ... \n", + "CL:4042009 An Astrocyte Type That Presents Radial Protrus... \n", + "CL:4042010 An Interlaminar Astrocyte Whose Soma Is Part O... \n", + "CL:4042011 An Interlaminar Astrocyte Type Whose Soma Is P... \n", + "\n", + " synonyms \\\n", + "ontology_id \n", + "CL:0000000 None \n", + "CL:0000001 primary cell culture cell|unpassaged cultured ... \n", + "CL:0000002 permanent cell line cell|continuous cell line ... \n", + "CL:0000003 None \n", + "CL:0000004 None \n", + "... ... \n", + "CL:4042007 None \n", + "CL:4042008 None \n", + "CL:4042009 None \n", + "CL:4042010 None \n", + "CL:4042011 None \n", + "\n", + " parents \n", + "ontology_id \n", + "CL:0000000 [] \n", + "CL:0000001 [CL:0000010] \n", + "CL:0000002 [] \n", + "CL:0000003 [] \n", + "CL:0000004 [] \n", + "... ... \n", + "CL:4042007 [CL:0000029, CL:0002605, CL:0010012] \n", + "CL:4042008 [CL:0000029, CL:2000029, CL:0000127] \n", + "CL:4042009 [CL:0000029, CL:0002605, CL:0010012] \n", + "CL:4042010 [CL:4042009] \n", + "CL:4042011 [CL:4042009] \n", + "\n", + "[2931 rows x 4 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "names = bt.CellType.public().df()\n", + "names" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[92m→\u001b[0m returning existing Organism record with same name: 'unknown'\n" + ] + } + ], + "source": [ + "populate_my_ontology(organisms=[\"NCBITaxon:9544\", \"NCBITaxon:9483\", \"NCBITaxon:10090\", \"NCBITaxon:9606\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
uidnameontology_idabbrsynonymsdescriptionsource_idrun_idcreated_by_idupdated_at
id
16QgHNi0H42-year-old human stageHsapDv:0000136NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 10:22:33.945048+00:00
2fEzwReEA53-year-old human stageHsapDv:0000147NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 10:22:33.945198+00:00
32nEVCZzo80-year-old human stageHsapDv:0000206NoneNoneAged Stage That Refers To An Adult Who Is Over...44None12023-11-22 10:22:33.945342+00:00
477LZiXev71-year-old human stageHsapDv:0000165NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:10:48.439710+00:00
52iDmILg850-year-old human stageHsapDv:0000144NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 11:10:48.439860+00:00
.................................
677447iQEAH2-4 year-old child stageHsapDv:0000270NoneNoneChild Stage That Refers To A Child Who Is Over...100None12024-09-09 14:54:06.875029+00:00
6782XEDd26sjuvenile stage (5-14 yo)HsapDv:0000271NoneNonePediatric Stage That Refers To A Human Who Is ...100None12024-09-09 14:54:06.875164+00:00
679JlT91ezY60-79 year-old stageHsapDv:0000272NoneNoneLate Adult Stage That Refers To An Adult Who I...100None12024-09-09 14:54:06.875290+00:00
6804RKFKLPE1-month-old stageHsapDv:0000273NoneNoneInfant Stage That Refers To An Infant Who Is O...100None12024-09-09 14:54:06.875417+00:00
6817PwM9y8dpostnatal stageHsapDv:0010000NoneNoneHuman Developmental Stage That Covers The Whol...100None12024-09-09 14:54:06.875545+00:00
\n", + "

437 rows × 10 columns

\n", + "
" + ], + "text/plain": [ + " uid name ontology_id abbr synonyms \\\n", + "id \n", + "1 6QgHNi0H 42-year-old human stage HsapDv:0000136 None None \n", + "2 fEzwReEA 53-year-old human stage HsapDv:0000147 None None \n", + "3 2nEVCZzo 80-year-old human stage HsapDv:0000206 None None \n", + "4 77LZiXev 71-year-old human stage HsapDv:0000165 None None \n", + "5 2iDmILg8 50-year-old human stage HsapDv:0000144 None None \n", + ".. ... ... ... ... ... \n", + "677 447iQEAH 2-4 year-old child stage HsapDv:0000270 None None \n", + "678 2XEDd26s juvenile stage (5-14 yo) HsapDv:0000271 None None \n", + "679 JlT91ezY 60-79 year-old stage HsapDv:0000272 None None \n", + "680 4RKFKLPE 1-month-old stage HsapDv:0000273 None None \n", + "681 7PwM9y8d postnatal stage HsapDv:0010000 None None \n", + "\n", + " description source_id run_id \\\n", + "id \n", + "1 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "2 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "3 Aged Stage That Refers To An Adult Who Is Over... 44 None \n", + "4 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "5 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + ".. ... ... ... \n", + "677 Child Stage That Refers To A Child Who Is Over... 100 None \n", + "678 Pediatric Stage That Refers To A Human Who Is ... 100 None \n", + "679 Late Adult Stage That Refers To An Adult Who I... 100 None \n", + "680 Infant Stage That Refers To An Infant Who Is O... 100 None \n", + "681 Human Developmental Stage That Covers The Whol... 100 None \n", + "\n", + " created_by_id updated_at \n", + "id \n", + "1 1 2023-11-22 10:22:33.945048+00:00 \n", + "2 1 2023-11-22 10:22:33.945198+00:00 \n", + "3 1 2023-11-22 10:22:33.945342+00:00 \n", + "4 1 2023-11-22 11:10:48.439710+00:00 \n", + "5 1 2023-11-22 11:10:48.439860+00:00 \n", + ".. ... ... \n", + "677 1 2024-09-09 14:54:06.875029+00:00 \n", + "678 1 2024-09-09 14:54:06.875164+00:00 \n", + "679 1 2024-09-09 14:54:06.875290+00:00 \n", + "680 1 2024-09-09 14:54:06.875417+00:00 \n", + "681 1 2024-09-09 14:54:06.875545+00:00 \n", + "\n", + "[437 rows x 10 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bt.DevelopmentalStage.filter().df()" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
uidnameontology_idabbrsynonymsdescriptionsource_idrun_idcreated_by_idupdated_at
id
325HWRj1ODunknownunknownNoneNoneNone44None12024-09-09 10:53:28.565897+00:00
\n", + "
" + ], + "text/plain": [ + " uid name ontology_id abbr synonyms description source_id \\\n", + "id \n", + "32 5HWRj1OD unknown unknown None None None 44 \n", + "\n", + " run_id created_by_id updated_at \n", + "id \n", + "32 None 1 2024-09-09 10:53:28.565897+00:00 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = bt.DevelopmentalStage.filter().df()\n", + "df[df.name==\"unknown\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "bt.DevelopmentalStage.filter(id=355).delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# load the latest datasets / load datasets that were initially dropped" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
uidnameontology_idabbrsynonymsdescriptionsource_idrun_idcreated_by_idupdated_at
id
16QgHNi0H42-year-old human stageHsapDv:0000136NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 10:22:33.945048+00:00
2fEzwReEA53-year-old human stageHsapDv:0000147NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 10:22:33.945198+00:00
32nEVCZzo80-year-old human stageHsapDv:0000206NoneNoneAged Stage That Refers To An Adult Who Is Over...44None12023-11-22 10:22:33.945342+00:00
477LZiXev71-year-old human stageHsapDv:0000165NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:10:48.439710+00:00
52iDmILg850-year-old human stageHsapDv:0000144NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 11:10:48.439860+00:00
63mvZclbG60-year-old human stageHsapDv:0000154NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 11:10:48.439989+00:00
76GYcyAiA80 year-old and over human stageHsapDv:0000095NoneNoneAged Stage That Refers To An Adult Who Is Over...44None12023-11-22 11:10:48.440112+00:00
82YPivzq577-year-old human stageHsapDv:0000171NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:10:48.440233+00:00
93MeD86pe82-year-old human stageHsapDv:0000208NoneNoneAged Stage That Refers To An Adult Who Is Over...44None12023-11-22 11:10:48.440354+00:00
105KlIAuyM87-year-old human stageHsapDv:0000213NoneNoneAged Stage That Refers To An Adult Who Is Over...44None12023-11-22 11:10:48.440474+00:00
113p5M36ez72-year-old human stageHsapDv:0000166NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:10:48.440595+00:00
123i4SGvgA45-year-old human stageHsapDv:0000139NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 11:11:34.327789+00:00
13glnFPgeh68-year-old human stageHsapDv:0000162NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:11:34.327912+00:00
143J3fzFpO51-year-old human stageHsapDv:0000145NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 11:11:34.328016+00:00
155lwKjZ3t43-year-old human stageHsapDv:0000137NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:11:34.328114+00:00
164BUoLKl041-year-old human stageHsapDv:0000135NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:11:34.328215+00:00
174pPeImFY55-year-old human stageHsapDv:0000149NoneNoneMiddle Aged Stage That Refers To An Adult Who ...44None12023-11-22 11:11:34.328314+00:00
187dWnm4CL27-year-old human stageHsapDv:0000121NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:11:34.328413+00:00
195q64GXx169-year-old human stageHsapDv:0000163NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:11:34.328514+00:00
205rviPulP74-year-old human stageHsapDv:0000168NoneNoneAdult Stage That Refers To An Adult Who Is Ove...44None12023-11-22 11:11:34.328612+00:00
\n", + "
" + ], + "text/plain": [ + " uid name ontology_id abbr synonyms \\\n", + "id \n", + "1 6QgHNi0H 42-year-old human stage HsapDv:0000136 None None \n", + "2 fEzwReEA 53-year-old human stage HsapDv:0000147 None None \n", + "3 2nEVCZzo 80-year-old human stage HsapDv:0000206 None None \n", + "4 77LZiXev 71-year-old human stage HsapDv:0000165 None None \n", + "5 2iDmILg8 50-year-old human stage HsapDv:0000144 None None \n", + "6 3mvZclbG 60-year-old human stage HsapDv:0000154 None None \n", + "7 6GYcyAiA 80 year-old and over human stage HsapDv:0000095 None None \n", + "8 2YPivzq5 77-year-old human stage HsapDv:0000171 None None \n", + "9 3MeD86pe 82-year-old human stage HsapDv:0000208 None None \n", + "10 5KlIAuyM 87-year-old human stage HsapDv:0000213 None None \n", + "11 3p5M36ez 72-year-old human stage HsapDv:0000166 None None \n", + "12 3i4SGvgA 45-year-old human stage HsapDv:0000139 None None \n", + "13 glnFPgeh 68-year-old human stage HsapDv:0000162 None None \n", + "14 3J3fzFpO 51-year-old human stage HsapDv:0000145 None None \n", + "15 5lwKjZ3t 43-year-old human stage HsapDv:0000137 None None \n", + "16 4BUoLKl0 41-year-old human stage HsapDv:0000135 None None \n", + "17 4pPeImFY 55-year-old human stage HsapDv:0000149 None None \n", + "18 7dWnm4CL 27-year-old human stage HsapDv:0000121 None None \n", + "19 5q64GXx1 69-year-old human stage HsapDv:0000163 None None \n", + "20 5rviPulP 74-year-old human stage HsapDv:0000168 None None \n", + "\n", + " description source_id run_id \\\n", + "id \n", + "1 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "2 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "3 Aged Stage That Refers To An Adult Who Is Over... 44 None \n", + "4 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "5 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "6 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "7 Aged Stage That Refers To An Adult Who Is Over... 44 None \n", + "8 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "9 Aged Stage That Refers To An Adult Who Is Over... 44 None \n", + "10 Aged Stage That Refers To An Adult Who Is Over... 44 None \n", + "11 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "12 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "13 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "14 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "15 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "16 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "17 Middle Aged Stage That Refers To An Adult Who ... 44 None \n", + "18 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "19 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "20 Adult Stage That Refers To An Adult Who Is Ove... 44 None \n", + "\n", + " created_by_id updated_at \n", + "id \n", + "1 1 2023-11-22 10:22:33.945048+00:00 \n", + "2 1 2023-11-22 10:22:33.945198+00:00 \n", + "3 1 2023-11-22 10:22:33.945342+00:00 \n", + "4 1 2023-11-22 11:10:48.439710+00:00 \n", + "5 1 2023-11-22 11:10:48.439860+00:00 \n", + "6 1 2023-11-22 11:10:48.439989+00:00 \n", + "7 1 2023-11-22 11:10:48.440112+00:00 \n", + "8 1 2023-11-22 11:10:48.440233+00:00 \n", + "9 1 2023-11-22 11:10:48.440354+00:00 \n", + "10 1 2023-11-22 11:10:48.440474+00:00 \n", + "11 1 2023-11-22 11:10:48.440595+00:00 \n", + "12 1 2023-11-22 11:11:34.327789+00:00 \n", + "13 1 2023-11-22 11:11:34.327912+00:00 \n", + "14 1 2023-11-22 11:11:34.328016+00:00 \n", + "15 1 2023-11-22 11:11:34.328114+00:00 \n", + "16 1 2023-11-22 11:11:34.328215+00:00 \n", + "17 1 2023-11-22 11:11:34.328314+00:00 \n", + "18 1 2023-11-22 11:11:34.328413+00:00 \n", + "19 1 2023-11-22 11:11:34.328514+00:00 \n", + "20 1 2023-11-22 11:11:34.328612+00:00 " + ] + }, + "execution_count": 78, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bt.DevelopmentalStage.filter().df().head(20)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['HsapDv:0000000', 'HsapDv:0000240', 'HsapDv:0000267',\n", + " 'HsapDv:0000226', 'HsapDv:0000258', 'HsapDv:0010000',\n", + " 'HsapDv:0000001'], dtype=object)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bt.DevelopmentalStage.public().df().loc[\"HsapDv:0000144\"].parents" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "image/svg+xml": [ + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "2iDmILg8\n", + "\n", + "\n", + "\n", + "2iDmILg8\n", + "\n", + " <50-year-old human stage<BR/><FONT COLOR="GREY" POINT-SIZE="10" FACE="Monospace">uid=2iDmILg8</FONT>>\n", + "\n", + "\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "bt.DevelopmentalStage.filter(ontology_id=\"HsapDv:0000144\").one().view_parents()#.df(include=[\"parents\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "bt.DevelopmentalStage.import_from_source()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "scprint", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/notebooks/additional/updates_on_grn.ipynb b/notebooks/additional/updates_on_grn.ipynb index f6f8e54..53d5b81 100644 --- a/notebooks/additional/updates_on_grn.ipynb +++ b/notebooks/additional/updates_on_grn.ipynb @@ -42,7 +42,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "💡 connected lamindb: jkobject/test\n" + "\u001b[92m→\u001b[0m connected lamindb: jkobject/scprint\n" ] } ],