-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
409 lines (297 loc) · 16.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
```{r, child = "vignettes/_version.Rmd"}
```
# scdrake
[![NEWS: updates](https://img.shields.io/badge/NEWS-updates-informational)](NEWS.md)
[![Documentation and vignettes (stable version)](https://img.shields.io/badge/Documentation%20&%20vignettes-bioinfocz.github.io/scdrake-informational)](https://bioinfocz.github.io/scdrake)
[![Documentation and vignettes (devel version)](https://img.shields.io/badge/Documentation%20&%20vignettes-bioinfocz.github.io/scdrake-informational)](https://bioinfocz.github.io/scdrake/dev)
[![Overview and outputs](https://img.shields.io/badge/Overview%20&%20outputs-vignette("pipeline_overview")-informational)](https://bioinfocz.github.io/scdrake/articles/pipeline_overview.html)
[![Pipeline diagram](https://img.shields.io/badge/Pipeline%20diagram-Show-informational)](https://github.com/bioinfocz/scdrake/blob/main/diagrams/README.md)
![License](https://img.shields.io/github/license/bioinfocz/scdrake)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Docker Image CI](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml/badge.svg?branch=main)](https://github.com/bioinfocz/scdrake/actions/workflows/docker-ci.yml)
`{scdrake}` is a scalable and reproducible pipeline for secondary analysis of droplet-based single-cell RNA-seq data (scRNA-seq) and spot-based spatial transcriptomics data (SRT).
`{scdrake}` is an R package built on top of the `{drake}` package, a [Make](https://www.gnu.org/software/make)-like pipeline
toolkit for [R language](https://www.r-project.org).
The main features of the `{scdrake}` pipeline are:
- Import of scRNA-seq data:
[10x Genomics Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)
output, delimited table, or `SingleCellExperiment` object.
- Import of SRT data:
[10x Genomics Space Ranger](https://www.10xgenomics.com/support/software/space-ranger/latest/getting-started/what-is-space-ranger)
output, delimited table, or `SingleCellExperiment` object, and tissue positions file as in Space ranger.
- Quality control and filtering of cells/spots and genes, removal of empty droplets.
- Higly variable genes detection, cell cycle scoring, normalization, clustering, and dimensionality reduction.
- Spatially variable genes detection (for SRT data)
- Cell type annotation using reference sets, cell type annotation using user-provided marker genes.
- Integration of multiple datasets.
- Computation of cluster markers and differentially expressed genes between clusters (denoted as "contrasts").
- Rich graphical and HTML outputs based on customizable RMarkdown documents.
- You can find links to example outputs [here](https://bioinfocz.github.io/scdrake/articles/pipeline_overview.html).
- Thanks to `{drake}`, the pipeline is highly efficient, scalable and reproducible, and also extendable.
- Want to change some parameter? No problem! Only parts of the pipeline which changed will rerun,
while up-to-date ones will be skipped.
- Want to reuse the intermediate results for your own analyses? No problem!
The pipeline has smartly defined checkpoints which can be loaded from a `{drake}` cache.
- Want to extend the pipeline? No problem! The pipeline definition is just an R object which can be arbitrarily extended.
For whom is `{scdrake}` purposed? It is primarily intended for tech-savvy users (bioinformaticians),
who pass on the results (reports, images) to non-technical persons (biologists).
At the same time, bioinformaticians can quickly react to biologists’ needs by changing the parameters of the pipeline,
which then efficiently skips already finished parts. This dialogue between the biologist and the bioinformatician is
indispensable during scRNA-seq data analysis. `{scdrake}` ensures that this communication is performed in an effective
and reproducible manner.
The pipeline structure along with [diagrams](https://github.com/bioinfocz/scdrake/blob/main/diagrams/README.md)
and links to outputs is described in `vignette("pipeline_overview")`
([link](https://bioinfocz.github.io/scdrake/articles/pipeline_overview.html)).
If you use `{scdrake}` in your research, please, consider citing
> Kubovciak J, Kolar M, Novotny J (2023).
“Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.” *Bioinformatics Advances*, **3**(1).
[doi:10.1093/bioadv/vbad089](https://doi.org/10.1093/bioadv/vbad089).
Huge thanks go to the authors of the
[Orchestrating Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/3.15/OSCA) book on whose
methods and recommendations is `{scdrake}` largely based.
***
# Installation instructions
## Using a Docker image (recommended)
A Docker image based on the [official Bioconductor image](https://bioconductor.org/help/docker/)
(version `r BIOC_VERSION`) is available. This is the most handy and reproducible way how to use
`{scdrake}` as all the dependencies are already installed and their versions are fixed.
In addition, the parent Bioconductor image comes bundled with RStudio Server.
The complete guide to the usage of `{scdrake}`'s Docker image can be found in the
[Docker vignette](https://bioinfocz.github.io/scdrake/articles/scdrake_docker.html).
**We strongly recommend to go through even if you are an experienced Docker user.**
Below you can find just the basic command to download the image and to run a detached container with RStudio in Docker or
to run `{scdrake}` in Singularity.
You can also run the image in [SingularityCE](https://docs.sylabs.io/guides/latest/user-guide/quick_start.html) (without RStudio) -
see the Singularity section in the Docker vignette above.
If the image is already downloaded in the local Docker storage, you can use `singularity pull docker-daemon:<image>`
You can pull the Docker image with the latest stable `{scdrake}` version using
```{r docker_stable, include = FALSE}
out <- scdrake::wrap_code(c(
glue::glue("docker pull {DOCKER_IMAGE_STABLE}"),
glue::glue("singularity pull docker:{DOCKER_IMAGE_STABLE}")
))
```
`r paste(knitr::knit(text = out), collapse = "\n")`
or list available versions in [our Docker Hub repository](https://hub.docker.com/r/jirinovo/scdrake/tags).
For the latest development version use
```{r docker_arm64, include = FALSE}
out <- scdrake::wrap_code(c(
glue::glue("docker pull {DOCKER_IMAGE_LATEST}"),
glue::glue("singularity pull docker:{DOCKER_IMAGE_LATEST}")
))
```
`r paste(knitr::knit(text = out), collapse = "\n")`
**Note for Mac users with M1/M2 chipsets**: until version 1.5.0 (inclusive), `arm64` images are available.
```bash
docker pull jirinovo/scdrake:1.5.0-bioc3.15-arm64
```
### Running the container
For the most common cases of host machines: Linux running Docker Engine, and Windows or MacOS running Docker Desktop.
First make a shared directory that will be mounted to the container:
```bash
mkdir ~/scdrake_projects
cd ~/scdrake_projects
```
And run the image that will expose RStudio Server on port 8787 on your host:
```{r docker_run_rstudio, include = FALSE}
out_docker_run_rstudio <- scdrake::format_shell_command(c(
"docker run -d",
"-v $(pwd):/home/rstudio/scdrake_projects",
"-p 8787:8787",
"-e USERID=$(id -u)",
"-e GROUPID=$(id -g)",
"-e PASSWORD=1234",
DOCKER_IMAGE_STABLE
))
```
`r knitr::knit(text = out_docker_run_rstudio)`
For Singularity, also make shared directories and execute the container ("run and forget"):
```bash
mkdir -p ~/scdrake_singularity
cd ~/scdrake_singularity
mkdir -p home/${USER} scdrake_projects
singularity exec \
-e \
--no-home \
--bind "home/${USER}/:/home/${USER},scdrake_projects/:/home/${USER}/scdrake_projects" \
--pwd "/home/${USER}/scdrake_projects" \
path/to/scdrake_image.sif \
scdrake <args> <command>
```
## Installing `{scdrake}` manually (not recommended)
<details>
<summary>Click for details</summary>
### Install the required system packages
- For Linux, follow the commands for your distribution [here](required_libs_linux.md).
- For MacOS: `$ brew install libxml2 imagemagick@6 harfbuzz fribidi libgit2 geos pandoc`
### Install R >= 4.2
See <https://cloud.r-project.org/>
From now on, all commands are for R.
### Install `{renv}`
[`{renv}`](https://rstudio.github.io/renv/) is an R package for management of local R libraries. It is intended to be used
on a per-project basis, i.e. each project should use its own library of R packages.
```r
install.packages("renv")
```
### Initialize a new `{renv}` library
Switch to directory where you will analyze data and initialize a new `{renv}` library:
```r
renv::consent(TRUE)
renv::init()
```
Now exit and run again R. You should see a message that renv library has been activated.
### Install BiocManager
```r
renv::install("BiocManager")
```
### Install Bioconductor 3.15
```r
BiocManager::install(version = "3.15")
```
### Restore `{scdrake}` dependencies from lockfile
`{renv}` also allows to export the current installed versions of R packages (and other things) into a lockfile.
Such lockfile is available for `{scdrake}` and you can use it to install all dependencies by
```{r renv_stable, include = FALSE}
out <- c(
"\n",
"```r",
"## -- This is a lockfile for the latest stable version of scdrake.",
glue::glue('download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/{LATEST_STABLE_VERSION}/renv.lock")'),
"## -- You can increase the number of CPU cores to speed up the installation.",
"options(Ncpus = 2)",
'renv::restore(lockfile = "renv.lock", repos = BiocManager::repositories())',
"```",
"\n"
)
```
`r paste(knitr::knit(text = out), collapse = "\n")`
For the lockfile for the latest development version use
```r
download.file("https://raw.githubusercontent.com/bioinfocz/scdrake/main/renv.lock")
```
### Install the `{scdrake}` package
Now we can finally install the `{scdrake}` package, but using a non-standard approach - without its dependencies
(which are already installed from the lockfile).
```{r install_stable, include = FALSE}
out <- c(
"\n",
"```r",
"remotes::install_github(",
glue::glue(' "bioinfocz/scdrake@{LATEST_STABLE_VERSION}",', .trim = FALSE),
" dependencies = FALSE, upgrade = FALSE,",
" keep_source = TRUE, build_vignettes = TRUE,",
" repos = BiocManager::repositories()",
")",
"```",
"\n"
)
```
`r paste(knitr::knit(text = out), collapse = "\n")`
For the latest development version use `"bioinfocz/scdrake"`.
### Install the command line interface (CLI)
Optionally, you can install `{scdrake}`'s CLI scripts with
```r
scdrake::install_cli()
```
CLI should be now accessible as a `scdrake` command. By default, the CLI is installed into `~/.local/bin`,
which is usually present in the `PATH` environment variable. In case it isn't, just add to your
`~/.bashrc`: `export PATH="${HOME}/.local/bin:${PATH}"`
**Every time you will be using the CLI make sure your current working directory is inside an `{renv}` project.**
You can read the reasons below.
<details>
<summary>Show details</summary>
You might notice that a per-project `{renv}` library and an installed CLI are "disconnected" and if you install
`{scdrake}` and its CLI within multiple projects (`{renv}` libraries), then the CLI scripts in `~/.local/bin` will
be overwritten each time. But when you run the `scdrake` command inside an `{renv}` project, the `renv` directory is
automatically detected and the `{renv}` library is activated by `renv::load()`, so the proper, locally installed
`{scdrake}` package is then used.
Also, there is a built-in guard: the version of the CLI must match the version of the bundled CLI scripts inside the
installed `{scdrake}` package. Anyway, we think changes in the CLI won't be very frequent, so this shouldn't be a
problem most of the time.
</details>
> TIP: To save time and space, you can symlink the `renv/library` directory to multiple `{scdrake}` projects.
</details>
***
## Quickstart
First run the `scdrake` image in Docker or Singularity - see the
[Docker vignette](https://bioinfocz.github.io/scdrake/articles/scdrake_docker.html)
Then you can go through the [Get Started vignette](https://bioinfocz.github.io/scdrake/articles/scdrake.html)
***
## Vignettes and other readings
See <https://bioinfocz.github.io/scdrake> for a documentation website of the latest stable version
(`r LATEST_STABLE_VERSION`) where links to vignettes below become real :-)
See <https://bioinfocz.github.io/scdrake/dev> for a documentation website of the current development version.
```{r, child = "vignettes/_vignette_signpost.Rmd"}
```
We encourage all users to read [basics](https://books.ropensci.org/drake) of the `{drake}` package.
While it is not necessary to know all `{drake}` internals to successfully run the `{scdrake}` pipeline,
its knowledge is a plus. You can read the minimum basics in `vignette("drake_basics")`.
Also, the prior knowledge of Bioconductor and its classes
(especially the [SingleCellExperiment](https://bioconductor.org/packages/3.15/bioc/html/SingleCellExperiment.html))
is considerable.
***
## Citation
Below is the citation output from using `citation("scdrake")` in R. Please
run this yourself to check for any updates on how to cite __scdrake__.
```{r "citation", eval = FALSE}
print(citation("scdrake"), bibtex = TRUE)
```
```
To cite package ‘scdrake’ in publications use:
Jiri Novotny and Jan Kubovciak (2021). scdrake: A Pipeline For 10x Chromium Single-Cell RNA-seq Data Analysis.
https://github.com/bioinfocz/scdrake, https://bioinfocz.github.io/scdrake.
A BibTeX entry for LaTeX users is
@Manual{,
title = {scdrake: A Pipeline For 10x Chromium Single-Cell RNA-seq Data Analysis},
author = {Jiri Novotny and Jan Kubovciak},
year = {2021},
note = {https://github.com/bioinfocz/scdrake, https://bioinfocz.github.io/scdrake},
}
```
Please note that the `{scdrake}` was only made possible thanks to many other R and bioinformatics software authors,
which are cited either in the vignettes and/or the paper(s) describing this package.
## Help and support
In case of any problems or suggestions, please, open a new [issue](https://github.com/bioinfocz/scdrake/issues).
We will be happy to answer your questions, integrate new ideas, or resolve any problems :blush:
You can also use [GitHub Discussions](https://github.com/bioinfocz/scdrake/discussions), mainly
for topics **not** related to development (bugs, feature requests etc.), but if you need e.g. a general help.
## Contribution
If you want to contribute to `{scdrake}`, read the [contribution guide](.github/CONTRIBUTING.md), please.
All pull requests are welcome! :slightly_smiling_face:
## Code of Conduct
Please note that the `{scdrake}` project is released with a
[Contributor Code of Conduct](https://bioinfocz.github.io/scdrake/CODE_OF_CONDUCT.html).
By contributing to this project, you agree to abide by its terms.
## Acknowledgements
### Funding
This work was supported by [ELIXIR CZ](https://www.elixir-czech.cz) research infrastructure project
(MEYS Grant No: LM2018131 and LM2023055) including access to computing and storage facilities.
### Software and methods used by `{scdrake}`
Many things are used by `{scdrake}`, but these are really worth mentioning:
- The [Bioconductor](https://www.bioconductor.org) ecosystem.
- The [*Orchestrating Single-Cell Analysis with Bioconductor*](https://bioconductor.org/books/3.15/OSCA) book.
- The [scran](https://bioconductor.org/packages/3.15/bioc/html/scran.html),
[scater](https://bioconductor.org/packages/3.15/bioc/html/scater.html), and other great packages from
[Aaron Lun](https://orcid.org/0000-0002-3564-4813) et al.
- The [drake](https://github.com/ropensci/drake) package.
- The [rmarkdown](https://github.com/rstudio/rmarkdown) package, and other ones from the
[tidyverse](https://www.tidyverse.org) ecosystem.
### Development tools
- Continuous code testing is possible thanks to [GitHub Actions](https://github.com/features/actions)
through `{usethis}`, `{remotes}`, and `{rcmdcheck}`.
Customized to use [Bioconductor's docker containers](https://www.bioconductor.org/help/docker).
- The [documentation website](https://bioinfocz.github.io/scdrake) is generated by `{pkgdown}`.
- The code is styled automatically thanks to `{styler}`.
- The documentation is formatted thanks to `{devtools}` and `{roxygen2}`.
This package was developed using `{biocthis}`.