- Installation
- Import modules
- Read in data
- Spatial domain detection using SpaGCN
- Identify SVGs
- Identify Meta Gene
The installation should take a few minutes on a normal computer. To install SpaGCN package you must make sure that your python version is over 3.5. If you don’t know the version of python you can check it by:
import platform
platform.python_version()
'3.8.8'
Note: Because SpaGCN pends on pytorch, you should make sure torch is correctly installed.
Now you can install the current release of SpaGCN by the following three ways:
pip3 install SpaGCN
#Note: you need to make sure that the pip is for python3,or you can install SpaGCN by
python3 -m pip install SpaGCN
#If you do not have permission (when you get a permission denied error), you can install SpaGCN by
pip3 install --user SpaGCN
Download the package from Github and install it locally:
git clone https://github.com/jianhuupenn/SpaGCN
cd SpaGCN/SpaGCN_package/
python3 setup.py install --user
If you do not have Python3.5 or Python3.6 installed, consider installing Anaconda. After installing Anaconda, you can create a new environment, for example, SpaGCN (you can change to any name you like).
#create an environment called SpaGCN
conda create -n SpaGCN python=3.7.9
#activate your environment
conda activate SpaGCN
git clone https://github.com/jianhuupenn/SpaGCN
cd SpaGCN/SpaGCN_package/
python3 setup.py build
python3 setup.py install
conda deactivate
import os,csv,re
import pandas as pd
import numpy as np
import scanpy as sc
import math
import SpaGCN as spg
from scipy.sparse import issparse
import random, torch
import warnings
warnings.filterwarnings("ignore")
import matplotlib.colors as clr
import matplotlib.pyplot as plt
import SpaGCN as spg
#In order to read in image data, we need to install some package. Here we recommend package "opencv"
#inatll opencv in python
#!pip3 install opencv-python
import cv2
spg.__version__
'1.2.0'
The current version of SpaGCN requres three input data.
- The gene expression matrix(n by k): expression_matrix.h5;
- Spatial coordinateds of samplespositions.txt;
- Histology image(optional): histology.tif, can be tif or png or jepg.
The gene expreesion data can be stored as an AnnData object. AnnData stores a data matrix .X together with annotations of observations .obs, variables .var and unstructured annotations .uns.
"""
#Read original data and save it to h5ad
from scanpy import read_10x_h5
adata = read_10x_h5("../tutorial/data/151673/expression_matrix.h5")
spatial=pd.read_csv("../tutorial/data/151673/positions.txt",sep=",",header=None,na_filter=False,index_col=0)
adata.obs["x1"]=spatial[1]
adata.obs["x2"]=spatial[2]
adata.obs["x3"]=spatial[3]
adata.obs["x4"]=spatial[4]
adata.obs["x5"]=spatial[5]
#Select captured samples
adata=adata[adata.obs["x1"]==1]
adata.var_names=[i.upper() for i in list(adata.var_names)]
adata.var["genename"]=adata.var.index.astype("str")
adata.write_h5ad("../tutorial/data/151673/sample_data.h5ad")
"""
#Read in gene expression and spatial location
adata=sc.read("../tutorial/data/151673/sample_data.h5ad")
#Read in hitology image
img=cv2.imread("../tutorial/data/151673/histology.tif")
-
s: This parameter determines the weight given to histology when calculating Euclidean distance between every two spots. ‘s = 1’ means that the histology pixel intensity value has the same scale variance as the (x,y) coordinates, whereas higher value of ‘s’ indicates higher scale variance, hence, higher weight to histology, when calculating the Euclidean distance.
-
b: This parameter determines the area of each spot when extracting color intensity.
-
n_clusters: Number of spatial domains wanted.
-
res: Resolution in the initial Louvain's Clustering methods. If the number of clusters is known, we can use the spg.search_res() fnction to search for suitable resolution(optional)
#Set coordinates
adata.obs["x_array"]=adata.obs["x2"]
adata.obs["y_array"]=adata.obs["x3"]
adata.obs["x_pixel"]=adata.obs["x4"]
adata.obs["y_pixel"]=adata.obs["x5"]
x_array=adata.obs["x_array"].tolist()
y_array=adata.obs["y_array"].tolist()
x_pixel=adata.obs["x_pixel"].tolist()
y_pixel=adata.obs["y_pixel"].tolist()
#Run SpaGCN
adata.obs["pred"]= spg.detect_spatial_domains_ez_mode(adata, img, x_array, y_array, x_pixel, y_pixel, n_clusters=7, histology=True, s=1, b=49, p=0.5, r_seed=100, t_seed=100, n_seed=100)
adata.obs["pred"]=adata.obs["pred"].astype('category')
#Refine domains (optional)
#shape="hexagon" for Visium data, "square" for ST data.
adata.obs["refined_pred"]=spg.spatial_domains_refinement_ez_mode(sample_id=adata.obs.index.tolist(), pred=adata.obs["pred"].tolist(), x_array=x_array, y_array=y_array, shape="hexagon")
adata.obs["refined_pred"]=adata.obs["refined_pred"].astype('category')
plot_color=["#F56867","#FEB915","#C798EE","#59BE86","#7495D3","#D1D1D1","#6D1A9C","#15821E","#3A84E6","#997273","#787878","#DB4C6C","#9E7A7A","#554236","#AF5F3C","#93796C","#F9BD3F","#DAB370","#877F6C","#268785"]
ax=spg.plot_spatial_domains_ez_mode(adata, domain_name="pred", x_name="y_pixel", y_name="x_pixel", plot_color=plot_color,size=150000/adata.shape[0], show=False, save=True,save_dir="./sample_results/pred.png")
ax=spg.plot_spatial_domains_ez_mode(adata, domain_name="refined_pred", x_name="y_pixel", y_name="x_pixel", plot_color=plot_color,size=150000/adata.shape[0], show=False, save=True,save_dir="./sample_results/refined_pred.png")
Spatial Domains Refined Spatial Domains
- target: Target domain to identify SVGs.
- min_in_group_fraction: Minium in-group expression fraction.
- min_in_out_group_ratio: Miniumn (in-group expression fraction) / (out-group expression fraction).
- min_fold_change: Miniumn (in-group expression) / (out-group expression).
- r: Radius to detect a spot's neighboring spots.target: Target domain to identify SVGs.
target=0
#Read in raw data
adata=sc.read("../tutorial/sample_results/results.h5ad")
raw=sc.read("../tutorial/data/151673/sample_data.h5ad")
#Preprocessing
raw.var_names_make_unique()
raw.obs["pred"]=adata.obs["pred"].astype('category')
raw.obs["x_array"]=raw.obs["x2"]
raw.obs["y_array"]=raw.obs["x3"]
raw.obs["x_pixel"]=raw.obs["x4"]
raw.obs["y_pixel"]=raw.obs["x5"]
raw.X=(raw.X.A if issparse(raw.X) else raw.X)
raw.raw=raw
sc.pp.log1p(raw)
#Set filtering criterials
min_in_group_fraction=0.8
min_in_out_group_ratio=1
min_fold_change=1.5
filtered_info=spg.detect_SVGs_ez_mode(raw, target=0, x_name="x_array", y_name="y_array", domain_name="pred", min_in_group_fraction=min_in_group_fraction, min_in_out_group_ratio=min_in_out_group_ratio, min_fold_change=min_fold_change)
filtered_info
genes | in_group_fraction | out_group_fraction | in_out_group_ratio | in_group_mean_exp | out_group_mean_exp | fold_change | pvals_adj | target_dmain | neighbors | |
---|---|---|---|---|---|---|---|---|---|---|
0 | CAMK2N1 | 1.000000 | 0.944964 | 1.058242 | 2.333675 | 1.578288 | 2.128434 | 1.656040e-11 | 0 | [3, 2] |
2 | ENC1 | 0.998638 | 0.941848 | 1.060295 | 2.457791 | 1.696083 | 2.141931 | 1.552131e-03 | 0 | [3, 2] |
4 | GPM6A | 0.997275 | 0.922118 | 1.081505 | 2.224006 | 1.561187 | 1.940255 | 8.602227e-03 | 0 | [3, 2] |
6 | ARPP19 | 0.982289 | 0.853583 | 1.150784 | 1.889256 | 1.272106 | 1.853637 | 4.823349e-02 | 0 | [3, 2] |
1 | HPCAL1 | 0.851499 | 0.465213 | 1.830342 | 1.141321 | 0.406338 | 2.085448 | 9.706465e-05 | 0 | [3, 2] |
color_self = clr.LinearSegmentedColormap.from_list('pink_green', ['#3AB370',"#EAE7CC","#FD1593"], N=256)
spg.plot_SVGs_ez_mode(adata=raw, gene_list=filtered_info["genes"].tolist(), x_name="y_pixel", y_name="x_pixel", plot_color=color_self, size=50, show=False, save=True, save_dir="./sample_results/")
CAMK2N1 ENC1 GPM6A ARPP19 HPCAL1
- target: Target domain to detect meta gene.
- start_gene: Gene to satrt with. Candidate start genes can be found by lowering the filtering criterials.
target=2
raw.obs["meta"]=spg.detect_meta_genes_ez_mode(raw, target, x_name="x_array", y_name="y_array", domain_name="pred", start_gene="GFAP", use_raw=False)
spg.plot_meta_genes_ez_mode(raw, x_name="y_pixel", y_name="x_pixel", meta_name="meta", plot_color=color_self, size=50, show=False, save=True, save_dir="./")