-
Notifications
You must be signed in to change notification settings - Fork 0
/
slingshot.Rmd
272 lines (223 loc) · 9.16 KB
/
slingshot.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
echo = TRUE,
eval = FALSE
)
```
# PathPinpointR
<!-- badges: start -->
<!-- badges: end -->
PathPinpointR identifies the position of a sample upon a trajectory.
##### *Assumptions:*
- Sample is found upon the chosen trajectory.
- Sample is from a distinct part of the trajectory. A sample with cells that are evenly distributed across the trajectory will have a predicted location of the centre of the trajectory.
# Example Workflow
This vignette will take you through the basics running PPR.
The data used here is generated in a similar fasion to the [Slingshot vignette](https://bioconductor.org/packages/devel/bioc/vignettes/slingshot/inst/doc/vignette.html).
## Installation
#### Check and install required packages
Run the following code to load all packages neccecary for PPR & this vignette.
```{r}
required_packages <- c("SingleCellExperiment", "Biobase", "fastglm", "ggplot2",
"monocle", "plyr", "RColorBrewer", "ggrepel", "ggridges",
"gridExtra", "devtools", "mixtools", "Seurat",
"parallel", "RColorBrewer", "mclust")
## for package "fastglm", "ggplot2", "plyr", "RColorBrewer",
# "ggrepel", "ggridges", "gridExtra", "mixtools"
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
## for package "SingleCellExperiment", "Biobase", "slingshot".
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) BiocManager::install(new_packages)
devtools::install_github("SGDDNB/GeneSwitches")
```
#### install PathPinpointR
You can install the development version of PathPinpointR using:
```{r}
devtools::install_github("moi-taiga/PathPinpointR")
```
### Load the required packages
```{r}
library(PathPinpointR)
library(Seurat)
library(ggplot2)
library(SingleCellExperiment)
library(slingshot)
library(RColorBrewer)
library(GeneSwitches)
library(mclust)
```
## Generate the data
The data is generated in a similar fasion to the [Slingshot vignette](https://bioconductor.org/packages/devel/bioc/vignettes/slingshot/inst/doc/vignette.html).
A subset of the reference data is used as a proxy for the sample data.
```{r}
load_all()
### Genertate the reference data
reference_sce <- get_synthetic_data()
# Generate the sample data by subsetting the reference data
# in one sample inlcude c20-c45, in the other c90-c128.
samples_sce <- list(reference_sce[,c(20:45)], reference_sce[,c(90:128)])
```
#### View the PCA plot from the reference data
Plot the reference data, colored by cluster.
```{r}
plot(reducedDims(reference_sce)$PCA,
col = brewer.pal(9,"Set1")[colData(reference_sce)$GMM],
pch=16, asp = 1)
# add a legend
legend("topright",
legend = levels(colData(reference_sce)$clust_names),
fill = brewer.pal(9,"Set1"))
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-reference-PCA-plot.png")
```
The plot shows the development of the cells.
## Run slingshot
Run slingshot on the reference data to produce pseudotime for each cell.
```{r}
# Slingshot
reference_sce <- slingshot(reference_sce,
clusterLabels = 'GMM',
reducedDim = 'PCA')
#Rename the Pseudotime column to work with GeneSwitches
colData(reference_sce)$Pseudotime <- reference_sce$slingPseudotime_1
```
#### Plot the slingshot trajectory.
```{r}
# Generate colors
colors <- colorRampPalette(brewer.pal(11, "Spectral")[-6])(100)
plotcol <- colors[cut(reference_sce$slingPseudotime_1, breaks = 100)]
# Plot the data
plot(reducedDims(reference_sce)$PCA, col = plotcol, pch=16, asp = 1)
lines(SlingshotDataSet(reference_sce), lwd = 2, col = "black")
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-slingshot.png")
```
The plot shows the trajectory of the reference data,
with cells colored by pseudotime.
## Binarize the reference Expression Data
Using the package [GeneSwitches](https://github.com/SGDDNB/GeneSwitches),
binarize the gene expression data of the reference data.
```{r, eval=FALSE}
# binarize the expression data of the reference
reference_sce <- binarize_exp(reference_sce,
fix_cutoff = TRUE,
binarize_cutoff = 0.4,
ncores = 1)
# Find the switching point of each gene in the reference data
reference_sce <- find_switch_logistic_fastglm(reference_sce,
downsample = TRUE,
show_warning = FALSE)
```
Note: both binatize_exp() and find_switch_logistic_fastglm()
are time consuming processes and may take tens of minutes to run.
## Visualise the switching genes
generate a list of switching genes, and visualise them on a pseudo timeline.
```{r}
switching_genes <- filter_switchgenes(reference_sce,
allgenes = TRUE,
r2cutoff = 0)
# Plot the timeline using plot_timeline_ggplot
plot_timeline_ggplot(switching_genes,
timedata = colData(reference_sce)$Pseudotime,
txtsize = 3)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-timeline-plot1.png")
```
The distribution of the switching genes along the trajectory is shown. \
Due to the nature of this syntheic data the switching gnenes are not evenly distributed.\
Note: The number of switching genes significantly affects the accuracy of PPR.\
too many will reduce the accuracy by including uninformative genes/noise. \
too few will reduce the accuracy by excluding informative genes. \
## Select a number of switching genes
Using the PPR function precision() an optimum number of switching genes can be found.
```{r}
precision(reference_sce, n_sg_range = seq(0, 800, 100))
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-precision1_plot.png")
```
Narrow down the search to find the optimum number of switching genes.
```{r}
precision(reference_sce, n_sg_range = seq(700, 822, 2))
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-precision2_plot.png")
```
## Produce a filtered matrix of switching genes
The using precision(), 772 is found to be the optimum for this data. \
When using true biological data, the optimum number is likely to be lower. \
```{r}
switching_genes <- filter_switchgenes(reference_sce,
allgenes = TRUE,
r2cutoff = 0,
topnum = 772)
```
## Visualise the filtered switching genes
```{r}
# Plot the timeline using plot_timeline_ggplot
plot_timeline_ggplot(switching_genes,
timedata = colData(reference_sce)$Pseudotime,
txtsize = 3)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-timeline-plot2.png")
```
## Binarize the sample data
Binarize the gene expression data of the samples.
```{r}
# First reduce the sample data to only include the switching genes.
samples_sce <- lapply(samples_sce, reduce_counts_matrix, switching_genes)
# binarize the expression data of the samples
samples_binarized <- lapply(samples_sce,
binarize_exp,
fix_cutoff = TRUE,
binarize_cutoff = 1,
ncores = 1)
```
## Predict Position
Produce an estimate for the position of each cell in each sample.
The prediction is stored as a PPR_OBJECT.
```{r}
reference_ppr <- predict_position(reference_sce, switching_genes)
# Iterate through each Seurat object in the predicting their positons,
# on the reference trajectory, using PathPinpointR.
samples_ppr <- lapply(samples_binarized, predict_position, switching_genes)
```
## Measure accuracy
We can calculate the accuracy of PPR in the given trajectory by comparing the
predicted position of the reference cells to their pseudotimes defined by slingshot.
```{r}
accuracy_test(reference_ppr, reference_sce, plot = TRUE)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/SLING-accuracy_plot.png")
```
## Plotting the predicted position of each sample:
plot the predicted position of each sample on the reference trajectory.
```{r}
# show the predicted position of the first sample
# include the position of cells in the reference data, by a given label.
ppr_plot() +
reference_idents(reference_sce, "clust_names") +
sample_prediction(samples_ppr[[1]], label = "Sample 1", col = "red")
# show the predicted position of the second sample
sample_prediction(samples_ppr[[2]], label = "Sample 2", col = "blue")
# show the points at which selected genes switch.
switching_times(c("G1172", "G1346", "G901"), switching_genes)
```
```{r eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/Sling-PPR.png")
```