Skip to content

Forebrain

jsxlei edited this page Sep 24, 2019 · 31 revisions

Tutorial on Forebrain dataset

Installation

install from GitHub

git clone git://github.com/jsxlei/SCALE.git
cd SCALE
python setup.py install

Get started with downloaded scATAC-seq data Forebrain [Download]

Run SCALE (command line)

SCALE.py -d Forebrain/data.txt -k 8

input_dir is Forebrain
all results are saved in default output dir output/

Results (python environment)

Load required packages

import pandas as pd
import numpy as np
import sklearn.metric import confusion_matrix
from matplotlib import pyplot as plt
import seaborn as sns

from scale.plot import plot_embedding, plot_heatmap

visualization

t-SNE embedding is saved in tsne.txt and tsne.pdf labeled by cluster assignments.

clustering results

clustering results are saved in cluster_assignments.txt

y = pd.read_csv('output/cluster_assignments.txt', sep='\t', index_col=0, header=None)

feature

latent feature are saved in feature.txt, we can plot this feature:

feature = pd.read_csv('output/feature.txt', sep='\t', index_col=0, header=None)
plot_heatmap(feature.T, y, 
             figsize=(8, 3), cmap='RdBu_r', vmax=8, vmin=-8, center=0,
             ylabel='Feature dimension', yticklabels=np.arange(10)+1, 
             cax_title='Feature value', legend_font=6, ncol=1,
             bbox_to_anchor=(1.1, 1.1), position=(0.92, 0.15, .08, .04))

interpret features

weight = get_decoder_weight('output/model.pt')
weight_index = imputed.index
peaks_of_feature = peak_selection(weight, weight_index)

raw = pd.read_csv(input_dir+'/data.txt', sep='\t', index_col=0) # load raw count matrix

for i, peak_index in enumerate(peaks_of_feature):
     peak_data = raw.loc[peak_index]
     plot_heatmap(peak_data, y
                  cmap='Reds', 
                  figsize=(10,4), 
                  cax_title='Peak value', 
                  ylabel='{} peaks of feature {}'.format(len(peak_index), i+1),
                  vmax=1, vmin=0, legend_font=8,
                  row_cluster=False,
                  show_legend=True,
                  show_cax = True,
                  bbox_to_anchor=(0.4, 1.32),
                  ncols=4)

We used GREAT to predict functions of cis-regulatory regions.

imputation

imputed data are saved in imputed_data.txt

imputed = pd.read_csv('output/imputed_data.txt', sep='\t', index_col=0)

imputed results improved identification of motifs by chromVAR.
left figure is the deviations score of significant motifs(adj_p_value of variability < 0.05).
right figure is the t-SNE plot using the motifs heatmap.

cluster-specific peaks

We provide an entropy-based method tto calculate cluster specificity for each peak across cluters.

from scale.specifity import cluster_specific, mat_specificity_score
from scale.utils import binarization

binary = binarization(imputed, raw)
score_mat = mat_specificity_score(imputed, y)
peak_index, peak_labels = cluster_specific(score_mat, np.unique(y), top=200)

for data in [raw, imputed, binary]:
    plot_heatmap(data.iloc[peak_index], y=y, row_labels=peak_labels, ncol=3, cmap='Reds', 
                 vmax=1, row_cluster=False, legend_font=6, cax_title='Peak Value',
                 figsize=(8, 10), bbox_to_anchor=(0.4, 1.2), position=(0.8, 0.76, 0.1, 0.015))

Clone this wiki locally