-
Notifications
You must be signed in to change notification settings - Fork 0
/
scp_processing.html
639 lines (476 loc) · 20.4 KB
/
scp_processing.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<title>Data processing with scp</title>
<meta charset="utf-8" />
<meta name="author" content="Laurent Gatto and Christophe Vanderaa" />
<script src="scp_processing_files/header-attrs-2.10/header-attrs.js"></script>
<script src="scp_processing_files/xaringanExtra-webcam-0.0.1/webcam.js"></script>
<script id="xaringanExtra-webcam-options" type="application/json">{"width":"200","height":"200","margin":"1em"}</script>
<link href="scp_processing_files/tile-view-0.2.6/tile-view.css" rel="stylesheet" />
<script src="scp_processing_files/tile-view-0.2.6/tile-view.js"></script>
<script src="scp_processing_files/xaringanExtra_fit-screen-0.2.6/fit-screen.js"></script>
<link href="scp_processing_files/xaringanExtra-extra-styles-0.2.6/xaringanExtra-extra-styles.css" rel="stylesheet" />
<link href="scp_processing_files/panelset-0.2.6/panelset.css" rel="stylesheet" />
<script src="scp_processing_files/panelset-0.2.6/panelset.js"></script>
<link rel="stylesheet" href="xaringan-themer.css" type="text/css" />
<link rel="stylesheet" href="custom.css" type="text/css" />
</head>
<body>
<textarea id="source">
class: center, middle, inverse, title-slide
# Data processing with scp
### Laurent Gatto and Christophe Vanderaa
### 2021/08/10
---
class: middle
name: cc-by
### Get the slides at https://bit.ly/202108SCP
These slides are available under a **creative common
[CC-BY license](http://creativecommons.org/licenses/by/4.0/)**. You are
free to share (copy and redistribute the material in any medium or
format) and adapt (remix, transform, and build upon the material) for
any purpose, even commercially
<img height="20px" alt="CC-BY" src="https://raw.githubusercontent.com/UCLouvain-CBIO/scp-teaching/main/img/cc1.jpg" />.
???
In this presentation, I will discuss about data processing and will
show you how to perform two important steps: quality control at
feature level and quality control at sample level. I will also briefly
mention and demonstrate how to perfor dimension reduction as an example
of downstream analysis.
The slides are available at the given link and are shared under CC-BY
license.
---
class: middle, inverse, center
# How to process single-cell proteomics data?
???
A burning question you might have once you acquired you single-cell
proteomics data is: how should I process single-cell proteomics data?
---
class: middle
## How to process single-cell proteomics data?
Overview of the workflow suggested in the SCoPE2 seminal paper [1].
<img src="./figs/scp_processing_workflow.svg" width="50%" style="display: block; margin: auto;" />
A full reproduction of the workflow using `scp` and `QFeatures` is
available from our
[preprint](http://dx.doi.org/10.1101/2021.04.12.439408) [2] and
[replication vignette](https://uclouvain-cbio.github.io/SCP.replication/articles/SCoPE2.html).
<p style="color:grey;font-size:0.75em;">
[1] Specht, Harrison, Edward Emmott, Aleksandra A. Petelski, R. Gray
Huffman, David H. Perlman, Marco Serra, Peter Kharchenko, Antonius
Koller, and Nikolai Slavov. 2021. “Single-Cell Proteomic and
Transcriptomic Analysis of Macrophage Heterogeneity Using SCoPE2.”
Genome Biology 22 (1): 50.
<br>
[2] Vanderaa, Christophe, and Laurent Gatto. 2021. “Utilizing Scp for
the Analysis and Replication of Single-Cell Proteomics Data.” bioRxiv.
</p>
???
Answering this question is not trivial. While further research is
required to provide principled guidelines about single-cell proteomics
data processing, we can already rely on existing processing workflows.
For instance, I show here the workflow from the SCoPE2 seminal paper
published by Nikolai's lab.
There are many different steps. Steps shown in yellow are steps commonly
applied to bulk proteomics and are therefore available from the `QFeatures`
package. However, some steps are specific to single cells and the functions
to performs them are available in the `scp` package.
Due to time constrains, I won't cover all steps, but you can find a full
reproduction of this workflow in our replication vignette and a discussion
in our preprint.
---
class: middle, center, inverse
# How to perform PSM quality control?
???
Let's start with one of the steps and let me show you how to perform
PSM quality control
---
class:
## PSM quality control
.panelset[
.panel[.panel-name[Description]
<br> PSM quality control removes low-quality PSMs to **improve the
reliability** of downstream analysis results.
<br>
**Example**: sample to carrier ratio
The sample to carrier ratio is the signal from samples divided by the
signal of the carrier (tens to hundreds of cell equivalent).
PSMs with high ratios indicate issues during acquisition and/or
quantification and need to be removed.
]
.panel[.panel-name[Plot]
Distribution of the sample to carrier ratio computed for all PSMs.
<img src="scp_processing_files/figure-html/unnamed-chunk-2-1.png" width="80%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]
Code for computing signal to carrier ratio for each PSM:
```r
scope2Data <- computeSCR(scope2Data, i = 1:3, colvar = "SampleType",
samplePattern = "Macrophage|Monocyte|Blank",
carrierPattern = "Carrier", rowDataName = "SCR")
```
Minimal code for plotting:
```r
rd <- rbindRowData(scope2Data, i = 1:3)
ggplot(data.frame(rd)) +
aes(x = SCR) +
geom_histogram() +
scale_x_log10()
```
Code for filtering:
```r
scope2Data <- filterFeatures(scope2Data, ~ SCR < 0.1)
```
]
]
???
### Description
The objective of PSM quality control is to identify and remove
low-quality PSMs in order to improve the reliability of the downstream
analysis results.
Let me use an example of quality control: the sample to carrier
ratio which is the signal from samples divided by the signal from
carriers. Carriers can contain tens to hundreds of cell equivalent. The logic
is: when a PSM that exhibits a high ratio, meaning the signal in samples
becomes close to the signal in carriers or even exceeds it, then this
means that an issues has occurred during acquisition and/or
quantification and we need to remove those PSMs.
### Plot
Let's see an example distribution of the sample to carrier ratios.
Most ratios are close to 1% as expected by the experimental design,
but you may have noticed there is a trail with much higher ratios.
Those indicate an issue and will be removed.
### Code
Let's quikcly see how to do that. More details about the code are
provided in the replication vignette. First, we need to compute the the sample to carrier ratio
using `computeSCR()`. It takes a `QFeatures` object and we tell how the
samples and how the carriers are defined. The results are stored as
feature annotation, here called `SCR`.
Then, the second code shunk plots the distribution. We extract the feature
annotations and provide it to the `ggplot()` function for plotting.
This will create the plot I just showed.
The last command is how you can apply the filter. The `filterFeatures()`
function takes the `QFeatures` object and a filtering statement, in this
case we keep only features that have a ratio lower than 10\%.
---
class: middle, center, inverse
# How to perform single-cell quality control?
???
Similarly to what we just saw, I will now demonstrate the quality
control on single-cell samples.
---
class:
## Single-cell quality control
.panelset[
.panel[.panel-name[Description]
<br> Single-cell quality control removes failed cell to **improve the
reliability** of downstream analysis results.
<br>
**Example**: median coefficient of variation
The coefficient of variation measures the robustness of quantification
for a protein in a sample. Taking the median across a single cell
provides an estimate of the robustness of quantification within that
single cell.
Single cells with a high median coefficient of variation indicate
issues during acquisition and need to be removed.
]
.panel[.panel-name[Plot]
Distribution of the median coefficient of variation computed for all
single-cell and blank samples.
<img src="scp_processing_files/figure-html/unnamed-chunk-6-1.png" width="70%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]
Code for computing the median coefficient of variation:
```r
scope2Data <- medianCVperCell(scope2Data, i = 1:3, groupBy = "Leading.razor.protein",
colDataName = "medianCV", nobs = 6, norm = "SCoPE2")
```
Code for plotting:
```r
cd <- colData(scope2Data)
ggplot(data.frame(cd)) +
aes(x = medianCV, fill = SampleType) +
geom_histogram() +
scale_x_log10() +
geom_vline(xintercept = 0.5, lty = "dashed")
```
Code for filtering:
```r
scope2Data <- subsetByColData(scope2Data, scope2Data$medianCV < 0.5)
```
]
]
???
### Description
The objective of single-cell quality control is to identify and remove
low-quality cells in order to improve the reliability of the downstream
analysis results, just like for feature quality control.
Again, let me use an example: The coefficient of variation measures
the robustness of quantification for a protein in a sample. Taking the
median across a single cell provides an estimate of the robustness of
quantification within that single cell. Single cells with a high
median coefficient of variation indicate issues during acquisition and
need to be removed.
### Plot
We can plot the distribution of the median coefficient of variation
computed for all single-cells. Including blank samples, that should
theoretically contain mostly noise signal, helps to define a threshold
where single cells are considered too noisy. Here you can see we set
the threshold at 0.5 to exclude all blank samples but this excludes
also a few low-quality single cells.
### Code
I here show you the code to perform the quality control, again split
in three chunks. The first chunk computes the median coefficient of
variation using the `medianCVperCell` function on a `QFeatures` object.
Various arguments allow to control how the coefficients are computed.
The computed coefficients are directly stored in the `QFeatures`
object as sample annotation.
The second chunk shows how to create the plot I just showed. We retrieve
the sample annotation and provide it to the `ggplot` function to
create the histogram.
The last chunk applies the filtering. `subsetByColData()` means that
we take a subset of the samples based on the sample annotation. Here, we
select all samples that have a median coefficient of variation lower
than 0.5, the threshold we defined based on the blanks.
And so this is how to perform two of the many steps of the workflow. Again,
we a comprehensive demonstration is provided in the replication vignette.
---
class: middle, inverse, center
# What's next after data processing?
???
You may ask yourself: what can I do after data processing?
---
class:
## What's next after data processing?
.panelset[
.panel[.panel-name[Description]
Downstream analysis is were the fun begins!
Common analyses: dimension reduction, cluster analysis, cluster
annotation, differential protein abundance analysis, ...
Many methods are readily applicable to the data thanks to the
**Bioconductor**
[`SingleCellExperiment`](https://www.bioconductor.org/packages/release/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html)
container [1]
**Example**: dimension reduction using the `scater` R package.
<br>
<p style="color:grey;font-size:0.75em;">
[1] Amezquita, Robert A., Aaron T. L. Lun, Etienne Becht, Vince J. Carey, Lindsay N. Carpp, Ludwig Geistlinger, Federico Martini, et al. 2019. “Orchestrating Single-Cell Analysis with Bioconductor.” Nature Methods, December, 1–9.
</p>
]
.panel[.panel-name[Plot]
t-SNE plot of the protein data.
<img src="scp_processing_files/figure-html/unnamed-chunk-10-1.png" width="80%" style="display: block; margin: auto;" />
]
.panel[.panel-name[Code]
Code for getting the protein data with sample annotations:
```r
prots <- getWithColData(scope2Data, "proteins")
```
Code for computing the tSNE (after imputation by zero):
```r
prots <- impute(prots, method = "zero")
prots <- runTSNE(prots, exprs_values = 1, perplexity = 5, name = "TSNE")
```
Code for plotting:
```r
plotTSNE(prots, colour_by = "Set")
```
]
]
???
### Description
Once your data is processed, you are ready to perform downstream
analyses and that's were the fun begins! Common downstream analyses are
for instance dimension reduction, cluster analysis, cluster annotation,
differential protein abundance analysis and many more.
Most of the methods to perform single-cell analyses are actually
already available. We can directly apply those methods
to our data because our data is contained in `SingleCellExperiment`
objects within the `QFeatures` object.
As an example, let me demonstrate how to apply dimension reduction
using the `scater` package.
### Plot
Here is a tSNE plot generated from an example protein data obtained by
aggregating the PSMs to proteins without further processing. You can
see on this plot that the main source of variability is the acquisition
batch, and this is often not desirable. So, a thorough data processing is critical for principled data
analysis in order to generate robust biological knowledge. If we do not correctly
model the data, we may pass through interesting information or focus
on confounding effects.
### Code
Again, here is a bit of code to perform dimension reduction. First, we extract
the data of interest from the `QFeatures` object. In this case, we
want the protein data using the `getWithColData(à` function that will
also transfer all the available sample annotations that we use for
colouring the plot.
Next, we compute the tSNE after a little imputation of missing data by
zero values. `runTSNE()` is a function from the `scater` package.
Finally, we generate the tSNE plot and colour it by the acquisition
set. Again, `plotTSNE()` is a function from `scater`.
I hope this presentation could help you understand how to perform
data processing using our software, namely quality control, and what
you could do with your data after processing.
---
class: middle, inverse, center
# Exercise
???
Let's know test your understanding with a small exercise.
---
class: middle
#### Given the distribution of the median coefficient of variation per cell, what command would you use to remove low-quality cells?
<img src="./figs/scp_processing_exercise.svg" width="50%" style="display: block; margin: auto;" />
```r
1. subsetByColData(scope2Data, scope2Data$medianCV < 0.25)
2. subsetByColData(scope2Data, scope2Data$medianCV > 0.25)
3. subsetByColData(scope2Data, scope2Data$medianCV < 0.32)
4. subsetByColData(scope2Data, scope2Data$medianCV > 0.32)
5. subsetByColData(scope2Data, scope2Data$medianCV < 0.45)
6. subsetByColData(scope2Data, scope2Data$medianCV > 0.45)
```
Connect to **http://www.wooclap.com/YYRQRM**
???
Given the distribution of the median coefficient of variation per cell,
what command would you use to remove low-quality cells?
Command 1, 2, 3, 4, 5 or 6? You can pause the video to think about it...
The solution is command 3. Setting a threshold at 0.32 best separates
between blank samples and the single-cells and we are interested to
keep samples with a low median coefficient of variation.
---
class: middle
### Further information
* Check out our vignette where we fully replicated the SCoPE2 workflow:
https://uclouvain-cbio.github.io/SCP.replication/articles/SCoPE2.html
* Check out our prepint where we discuss the current challenges in
single-cell proteomics data anylysis:
http://dx.doi.org/10.1101/2021.04.12.439408)
* Try it yourself in this online workshop:
https://lgatto.github.io/QFeaturesScpWorkshop2021/
### Funding
Fonds de la Recherche Scientifique (FNRS), Belgium
???
I hope you found the correct solution. If you're looking for more
detailed information, you can have a look at our replication vignette
and our preprint. If your interested to get your hands on our software,
check out our online workshop! Thank you very much for watching and
see you in another video.
</textarea>
<style data-target="print-only">@media screen {.remark-slide-container{display:block;}.remark-slide-scaler{box-shadow:none;}}</style>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script>var slideshow = remark.create({
"highlightStyle": "github",
"highlightLines": true,
"ratio": "16:9",
"countIncrementalSlides": true
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {
window.dispatchEvent(new Event('resize'));
});
(function(d) {
var s = d.createElement("style"), r = d.querySelector(".remark-slide-scaler");
if (!r) return;
s.type = "text/css"; s.innerHTML = "@page {size: " + r.style.width + " " + r.style.height +"; }";
d.head.appendChild(s);
})(document);
(function(d) {
var el = d.getElementsByClassName("remark-slides-area");
if (!el) return;
var slide, slides = slideshow.getSlides(), els = el[0].children;
for (var i = 1; i < slides.length; i++) {
slide = slides[i];
if (slide.properties.continued === "true" || slide.properties.count === "false") {
els[i - 1].className += ' has-continuation';
}
}
var s = d.createElement("style");
s.type = "text/css"; s.innerHTML = "@media print { .has-continuation { display: none; } }";
d.head.appendChild(s);
})(document);
// delete the temporary CSS (for displaying all slides initially) when the user
// starts to view slides
(function() {
var deleted = false;
slideshow.on('beforeShowSlide', function(slide) {
if (deleted) return;
var sheets = document.styleSheets, node;
for (var i = 0; i < sheets.length; i++) {
node = sheets[i].ownerNode;
if (node.dataset["target"] !== "print-only") continue;
node.parentNode.removeChild(node);
}
deleted = true;
});
})();
(function() {
"use strict"
// Replace <script> tags in slides area to make them executable
var scripts = document.querySelectorAll(
'.remark-slides-area .remark-slide-container script'
);
if (!scripts.length) return;
for (var i = 0; i < scripts.length; i++) {
var s = document.createElement('script');
var code = document.createTextNode(scripts[i].textContent);
s.appendChild(code);
var scriptAttrs = scripts[i].attributes;
for (var j = 0; j < scriptAttrs.length; j++) {
s.setAttribute(scriptAttrs[j].name, scriptAttrs[j].value);
}
scripts[i].parentElement.replaceChild(s, scripts[i]);
}
})();
(function() {
var links = document.getElementsByTagName('a');
for (var i = 0; i < links.length; i++) {
if (/^(https?:)?\/\//.test(links[i].getAttribute('href'))) {
links[i].target = '_blank';
}
}
})();
// adds .remark-code-has-line-highlighted class to <pre> parent elements
// of code chunks containing highlighted lines with class .remark-code-line-highlighted
(function(d) {
const hlines = d.querySelectorAll('.remark-code-line-highlighted');
const preParents = [];
const findPreParent = function(line, p = 0) {
if (p > 1) return null; // traverse up no further than grandparent
const el = line.parentElement;
return el.tagName === "PRE" ? el : findPreParent(el, ++p);
};
for (let line of hlines) {
let pre = findPreParent(line);
if (pre && !preParents.includes(pre)) preParents.push(pre);
}
preParents.forEach(p => p.classList.add("remark-code-has-line-highlighted"));
})(document);</script>
<script>
slideshow._releaseMath = function(el) {
var i, text, code, codes = el.getElementsByTagName('code');
for (i = 0; i < codes.length;) {
code = codes[i];
if (code.parentNode.tagName !== 'PRE' && code.childElementCount === 0) {
text = code.textContent;
if (/^\\\((.|\s)+\\\)$/.test(text) || /^\\\[(.|\s)+\\\]$/.test(text) ||
/^\$\$(.|\s)+\$\$$/.test(text) ||
/^\\begin\{([^}]+)\}(.|\s)+\\end\{[^}]+\}$/.test(text)) {
code.outerHTML = code.innerHTML; // remove <code></code>
continue;
}
}
i++;
}
};
slideshow._releaseMath(document);
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement('script');
script.type = 'text/javascript';
script.src = 'https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML';
if (location.protocol !== 'file:' && /^https?:/.test(script.src))
script.src = script.src.replace(/^https?:/, '');
document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
</body>
</html>