MorphoGenie - Unsupervised Deep Learning Framework for GENeralizable, Interpretable/Explainable Single-Cell Morphological Profiling

The intersection of advanced microscopy and machine learning is transforming the field of cell biology, enabling a more quantitative and data-driven approach. Traditional methods of morphological profiling, which rely on manual feature extraction, are often time-consuming, labor-intensive, and susceptible to human bias. Deep learning offers a promising alternative, but its effectiveness is hindered by its "black-box" operation and its dependence on extensive labeled data.

MorphoGenie addresses these challenges by introducing an unsupervised deep-learning framework for single-cell morphological profiling. This innovative tool generates high-fidelity image reconstructions, enabling disentangled representation learning and a compact, interpretable latent space. This allows for the extraction of biologically meaningful features without human annotation, overcoming the "curse of dimensionality" inherent in manual methods.

MorphoGenie standsout in three key attributes:

High-fidelity Image Reconstruction: MorphoGenie utilizes a hybrid architecture that capitalizes on the unparalleled strengths of the variant of variational Autoencoders (VAEs) and generative Adversarial Networks (GANs) to achieve interpretable, high-quality cell image generation.
Interpretability: MorphoGenie adopts a VAE-based method to learn a compact, interpretable, and transferable disentangled representation for single-cell morphological analysis. In contrast to the prior work, we propose a novel technique for interpreting the learned representation by extracting handcrafted features from reconstructed images produced by latent traversals, facilitating the discovery of biologically meaningful inferences, especially the heterogeneities of cell types and lineages.
Generalizability: MorphoGenie is widely adaptable across various imaging modalities and experimental conditions, promoting cross-study comparisons and reusable morphological profiling results. The model generalizes to unseen single-cell datasets and different imaging modalities while providing explanations for its predictions. Overall, MorphoGenie could spearhead new strategies for conducting comprehensive morphological profiling and make biologically meaningful discoveries across a wide range of imaging modalities.

Requirements

- Pytorch,
- Tensorboard,
- pandas, matplotlib, seaborn, umap, numpy, tqdm
Tested on windows

Setting up training/testing environment

Install the pytorch cuda version suitable for the OS in MorphoGenie environment Link.

conda create —name MorphoGenie python=3.8.10

conda activate MorphoGenie

conda install -c anaconda pandas=1.4.2 matplotlib=3.5.1 seaborn=0.11.2 umap-learn numpy=1.22 tqdm=4.63.0 scikit-image umap-learn scikit-learn=1.0.2 pillow=9.2.0

conda install -c anaconda tensorboard

Setting up the test environment and installing dependencies can take around 10 minutes.

Dataset

The processed datasets for testing are available here.

Dataset	Folder Name	Imaging Modality
Lung Cancer	LC	QPI
Cell Painting Assay	CPA	Fluorescence
Epithelial to Mesencymal Transition	EMT	Fluorescence
CellCycle	CCy	QPI

Folder structure

Label1

	Folder

		Image1.jpg
		Image2.jpg
		.
		.
		.
		ImageM.jpg

Label2

	Folder

		Image1.jpg
		Image2.jpg
		.
		.
		.
		ImageN.jpg

Image Preprocessing

The single-cell images in the dataset to be tested are required to be segmented, cropped as a single cell image. Segmentation is performed using any of the available tools such as Cellpose.

Testing with pre-trained models

Testing dataset comprising 1500 cell images takes 1 minute.

Load the pre-trained model and select the dataset for testing. This step generates Latent.csv and Label.csv for downstream analysis such as cell data visualization, classification and interpretation tasks.

python MorphoGenie_Test.py --config cells_650.yaml -Traversal_Save=False --Train_Dataset=LC --Test_Dataset=LC

Train_Dataset here refers to the dataset used for taining the model. Alternatively, MorphoGenie_Test.ipynb can be employed for testing the performance.

Generalizability

To assess MorphoGenie's generalizability, model pre-trained on a dataset from one imaging modality can be employed to test its performance on unseen datasets with different image contrasts. MorphoGenie could apply its trained latent representations to perform accurate downstream analyses and predictions on these new test datasets, without any retraining.

python MorphoGenie_Test.py --config cells_650.yaml -Traversal_Save=False --Train_Dataset=CCy --Test_Dataset=LC

To test generalizability, the pre-trained model is loaded and latent features are predicted by simply inputting the location of a new pre-processed dataset.

Interpreting Disentangled Latent Space in MorphoGenie

MorphoGenie's interpretability is enhanced through analysis of how its disentangled latent space relates to the physical characteristics of individual cells. These characteristics are identified through a hierarchical analysis that categorizes features into a structured framework, ranging from subtle textures to more distinct properties like cell size, shape, and density. Using this, MorphoGenie creates a profile called "Interpretation Heatmap" that enables meaningful and biologically relevant interpretations of the disentangled representations.

Traversal reconstructions are generated (setting flag Traverse_Save = True) to interpret MorphoGenie's disentangled latent space. Images are saved in the folder <outputs>.

python MorphoGenie_Test.py --config cells_650.yaml --Traversal_Save=True --Train_Dataset=LC --Test_Dataset=LC

The process involves: 1.⁠ ⁠50 traversal sets 2.⁠ ⁠Variance matrix computation 3.⁠ ⁠Averaging for heatmap generation.

The resulting heatmap reveals latent space structure and feature correlations.

Variance matrix is generated by extracting morphological features from the traversal sets. The feature extraction MATLAB code is provided here.

Train models with new datasets

Step 1: Train VAEs. This step requires setting up an different enviromnent to train the VAE according to: Factor-VAE


python dvae\_main.py --dataset [dataset\_name] --name [dvae\_run\_name] --c\_dim [c\_dim] --beta [beta]

Where [dataset_name] can be one of LC, CellCycle, CellPainingAssay, and EMT. please refer to dvae_main.py for the details.

Step 2: Train ID-GAN through information distillation loss.

python train.py --config [config\_name] --dvae\_name [dvae\_run\_name] --name [idgan\_run\_name]

Saving Results

Results, including checkpoints, tensorboard logs, and images can be found in outputs directory.

Acknowledgement

Pytorch implementation on "High-fidelity Synthesis with Disentangled Representation" (https://arxiv.org/abs/2001.04296)

This code is built based on the following code repositories:

Information Distillation GAN (ID-GAN): https://github.com/1Konny/idgan.git
Factor-VAE: [https://github.com/1Konny/FactorVAE.git]

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
Datasets		Datasets
Figures		Figures
InterpretationHeatmaps		InterpretationHeatmaps
Results		Results
TraversalImages/FeaturesExtraction		TraversalImages/FeaturesExtraction
configs		configs
gan_training		gan_training
LICENSE		LICENSE
MorphoGenie_Test.ipynb		MorphoGenie_Test.ipynb
MorphoGenie_Test.py		MorphoGenie_Test.py
MorphoGenie_train.py		MorphoGenie_train.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MorphoGenie - Unsupervised Deep Learning Framework for GENeralizable, Interpretable/Explainable Single-Cell Morphological Profiling

Requirements

Setting up training/testing environment

Dataset

Folder structure

Image Preprocessing

Testing with pre-trained models

Generalizability

Interpreting Disentangled Latent Space in MorphoGenie

Train models with new datasets

Saving Results

Acknowledgement

About

Releases

Packages

Languages

License

rashmisrm/MorphoGenie

Folders and files

Latest commit

History

Repository files navigation

MorphoGenie - Unsupervised Deep Learning Framework for GENeralizable, Interpretable/Explainable Single-Cell Morphological Profiling

Requirements

Setting up training/testing environment

Dataset

Folder structure

Image Preprocessing

Testing with pre-trained models

Generalizability

Interpreting Disentangled Latent Space in MorphoGenie

Train models with new datasets

Saving Results

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages