From 5cdb366dd10d1c648ea002dd60ccf3ff2c248494 Mon Sep 17 00:00:00 2001 From: nilseling Date: Mon, 27 Nov 2023 16:32:08 +0100 Subject: [PATCH 1/5] Added developers documentation --- .gitignore | 1 + DEVELOPMENT.md | 129 +++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 25 +++++++++- index.Rmd | 24 ++++++++- 4 files changed, 176 insertions(+), 3 deletions(-) create mode 100644 DEVELOPMENT.md diff --git a/.gitignore b/.gitignore index 63437e7d..50bdd926 100644 --- a/.gitignore +++ b/.gitignore @@ -18,3 +18,4 @@ outputs/* !publication/README.md !publication/protocol.md !CHANGELOG.md +!DEVELOPMENT.md diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md new file mode 100644 index 00000000..883cd985 --- /dev/null +++ b/DEVELOPMENT.md @@ -0,0 +1,129 @@ +# Useful information when developing this book + +This document is to guide future developers to maintain and extend the IMC +data analysis book. + +## General setup + +* The IMC data analysis book is written in [bookdown](https://bookdown.org/). +* Each section is stored in its own `.Rmd` file with `index.Rmd` building the landing page +* References are stored in `book.bib` +* At the end of each `.Rmd` file a number of unit tests are executed. These +unit tests are always executed but their results are not shown in the book. + +### Continous integration/continous deployment + +* CI/CD is executed based on the workflow [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/blob/main/.github/workflows/build.yml). +* On the first of each month based on the [Dockerfile](https://github.com/BodenmillerGroup/IMCDataAnalysis/blob/main/Dockerfile) a new Docker image is build. We are doing this so that the workflow is always tested against the newest software versions. +* The Docker image is pushed to the Github Container Registry [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/pkgs/container/imcdataanalysis). +* The Docker image is date tagged and `latest` always refers to the newest build. +* Once the Docker image is build, the IMC data analysis book is executed within the +newest Docker image. This will also run all unit tests. + +**Of note:** Sometimes the calculation of the UMAP produces slightly different +results. If that happens the workflow run can be re-executed by clicking the `Re-run jobs` button of the workflow run. +This test could also be excluded on the long run. + +* When pushing to `main` (either directly or via a PR), the CI/CD workflow is +executed. +* If the Dockerfile changed (e.g., if you want to add a new package), a new Docker image is build and the workflow is executed within the new Docker image. +* If the Dockerfile did not change, the workflow is executed within the most recent Docker image. + +## Updating the book + +This section describes how to update the book. You want to do this to add new content +but also to fix bugs or adjust unit tests. + +### Work on the devel branch + +It is recommended to work on the `devel` branch of the Github repository to add +new changes. + +### Work within the newest Docker container + +It is also recommended to always work within a Docker container based on the newest +Docker image available: + +1. After installing [Docker](https://docs.docker.com/get-docker/) you can first pull the container via: + +``` +docker pull ghcr.io/bodenmillergroup/imcdataanalysis:yyyy-mm-dd +``` + +and then run the container: + +``` +docker run -v /path/to/IMCDataAnalysis:/home/rstudio/IMCDataAnalysis \ + -e PASSWORD=bioc -p 8787:8787 \ + ghcr.io/bodenmillergroup/imcdataanalysis:yyyy-mm-dd +``` + +2. An RStudio server session can be accessed via a browser at `localhost:8787` using `Username: rstudio` and `Password: bioc`. +3. Navigate to `IMCDataAnalysis` and open the `IMCDataAnalysis.Rproj` file. +4. Code in the individual files can now be executed or the whole workflow can be build by entering `bookdown::render_book()`. + +### Adding new packages + +If you need to add new packages to the workflow, make sure to add them to the +[software requirements](https://bodenmillergroup.github.io/IMCDataAnalysis/prerequisites.html#software-requirements) +section and to the Dockerfile. + +### Opening a pull request + +Now you can change the content of the book. +Once you have added all changes, push the changes to `devel` and open a pull request +to `main`. Wait until all checks have passed and you can merge the PR. + +### Add changes to CHANGELOG.md + +Please track the changes that you are making in the [CHANGELOG.md](CHANGELOG.md) file. + +### Trigger a new release + +Once you have added the changes to the CHANGELOG, merged the pull request and +the workflow has been executed on CI/CD, you can trigger a new release. + +* Go to [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/releases) and click on `Draft a new release` at the top of the page. +* Under `Choose a tag` create a new tag and give details on the release. +* With each release the corresponding [Zenodo repository](https://zenodo.org/records/10209942) is updated. + +## Updating the data + +For new `steinbock` releases and specifically if the Mesmer version changes, the +example data should be updated. The example data are stored on Central NAS +and are hosted on Zenodo. + +### Re-analyse the example data + +* You can find the raw data on [zenodo](https://zenodo.org/records/7575859). +* On Central NAS under projects/IMCWorkflow/zenodo create a new folder called `steinbock_0.x.y` where x denotes the new major version and y the new minor version. +* Copy the `steinbock.sh` script from the folder of the previous version to to folder of the newest version. +* Change the steinbock version number in the `steinbock.sh` script and execute it. +* It should generate all relevant files and zip all folders. + +### Upload data to zenodo + +* On [zenodo](https://zenodo.org/records/7624451), click on `New version` and replace all files with the newer version. No need to upload the raw data to zenodo as they are hosted in a different repository + +### Adjust the book + +* Work in the most recent Docker container and on the devel branch. +* Manually go through each section, update the links in the [Prerequisites](https://bodenmillergroup.github.io/IMCDataAnalysis/prerequisites.html#download-data) section +* Make sure to check and asjust the unit tests at the end of each file +* Make sure that the text (e.g. clustering) still matches the results + +*Important:* as we are training a random forest classifier on manually gated cells, these gated cells won't match the newest version of the data if the Mesmer version changed. For this, we have the `code/transfer_labels.R` script that automatically re-gates cells in the new SPE object. + +* Go through all sections until `Cell phenotyping` +* Based on the old `gated_cells` and the new SPE object, execute the `code/transfer_labels.R` script +* Zip the new `gated_cells` and upload them to a new version on [zendod](https://zenodo.org/records/8095133) +* Adjust the link to the new gated cells in the [Prerequisites](https://bodenmillergroup.github.io/IMCDataAnalysis/prerequisites.html#download-data) section +* Make sure that the new classification results closely match the new results + +* Continue going through the book + +### Add changes to CHANGELOG.md + +Finally, add all the recent changes to the CHANGELOG, create and merge a PR and create a new release (see above). + + diff --git a/README.md b/README.md index 691f196c..65de0031 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,6 @@ R workflow highlighting analyses approaches for multiplexed imaging data. ## Scope - This workflow explains the use of common R/Bioconductor packages to pre-process and analyse single-cell data obtained from segmented multichannel images. While we use imaging mass cytometry (IMC) data as an example, the concepts presented here can be applied to images obtained by other technologies (e.g. CODEX, MIBI, mIF, CyCIF, etc.). The workflow can be largely divided into the following parts: @@ -23,6 +22,13 @@ The workflow can be largely divided into the following parts: 6. Image visualization 7. Spatial analyses +## Update freeze + +This workflow has been actively developed until December 2023. At that time +we used the most recent (`v.0.16.0`) version of `steinbock` to process the +example data. If you are having issues when using newer versions of `steinbock` +please open an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). + ## Usage To reproduce the analysis displayed at [https://bodenmillergroup.github.io/IMCDataAnalysis/](https://bodenmillergroup.github.io/IMCDataAnalysis/) clone the repository via: @@ -58,6 +64,20 @@ docker pull ghcr.io/bodenmillergroup/imcdataanalysis: 3. Navigate to `IMCDataAnalysis` and open the `IMCDataAnalysis.Rproj` file. 4. Code in the individual files can now be executed or the whole workflow can be build by entering `bookdown::render_book()`. +## Feedback + +We provide the workflow as an open-source resource. It does not mean that +this workflow is tested on all possible datasets or biological questions and +there exist multiple ways of analysing data. It is therefore recommended to +check the results and question their biological interpretation. + +If you notice an issue or missing information, please report an issue +[here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). We also +welcome contributions in form of pull requests or feature requests in form of +issues. Have a look at the source code at: + +[https://github.com/BodenmillerGroup/IMCDataAnalysis](https://github.com/BodenmillerGroup/IMCDataAnalysis) + ## Contributing guidelines For feature requests and bug reports, please raise an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). @@ -68,10 +88,11 @@ To add new libraries to the container please add them to the [Dockerfile](Docker ## Maintainer -[Nils Eling](https://github.com/nilseling) +[Daniel Schulz](https://github.com/SchulzDan) ## Contributors +[Nils Eling](https://github.com/nilseling) [Vito Zanotelli](https://github.com/votti) [Daniel Schulz](https://github.com/SchulzDan) [Jonas Windhager](https://github.com/jwindhager) diff --git a/index.Rmd b/index.Rmd index dbeda90d..8dbba7b0 100644 --- a/index.Rmd +++ b/index.Rmd @@ -45,10 +45,19 @@ spatial analysis and the user will need to become familiar with the general framework to efficiently analyse data obtained from multiplexed imaging technologies. +## Update freeze + +This workflow has been actively developed until December 2023. At that time +we used the most recent (`v.0.16.0`) version of `steinbock` to process the +example data. If you are having issues when using newer versions of `steinbock` +please open an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). + ## Feedback and contributing We provide the workflow as an open-source resource. It does not mean that -this workflow is tested on all possible datasets or biological questions. +this workflow is tested on all possible datasets or biological questions and +there exist multiple ways of analysing data. It is therefore recommended to +check the results and question their biological interpretation. If you notice an issue or missing information, please report an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). We also @@ -57,6 +66,19 @@ issues. Have a look at the source code at: [https://github.com/BodenmillerGroup/IMCDataAnalysis](https://github.com/BodenmillerGroup/IMCDataAnalysis) +## Maintainer + +[Daniel Schulz](https://github.com/SchulzDan) + +## Contributors + +[Nils Eling](https://github.com/nilseling) +[Vito Zanotelli](https://github.com/votti) +[Daniel Schulz](https://github.com/SchulzDan) +[Jonas Windhager](https://github.com/jwindhager) +[Michelle Daniel](https://github.com/michdaniel) +[Lasse Meyer](https://github.com/lassedochreden) + ## Citation The workflow has been published in From bad7b97aa36b061bb965006fa4e880c3cad50cb5 Mon Sep 17 00:00:00 2001 From: nilseling Date: Mon, 27 Nov 2023 17:08:16 +0100 Subject: [PATCH 2/5] Added more ways to visualize CN composition --- 11-spatial_analysis.Rmd | 64 +++++++++++++++++++++++++++++------------ DEVELOPMENT.md | 5 ++++ 2 files changed, 51 insertions(+), 18 deletions(-) diff --git a/11-spatial_analysis.Rmd b/11-spatial_analysis.Rmd index 8c7397ab..abecf9d8 100644 --- a/11-spatial_analysis.Rmd +++ b/11-spatial_analysis.Rmd @@ -334,20 +334,60 @@ plotSpatial(spe, scale_color_brewer(palette = "Set3") ``` -The next code chunk visualizes the cell type compositions of the -detected cellular neighborhoods (CN). +There are now different visualizations to examine the cell type composition +of the detected cellular neighborhoods (CN). First we can look at the total +number of cells per cell type and CN. ```{r} -for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), - margin = 1) +for_plot <- table(spe$cn_celltypes, spe$celltype) + +pheatmap(for_plot, + color = viridis(100), display_numbers = TRUE, + number_color = "white", number_format = "%.0f") +``` + +Next, we can observe per cell type the fraction of CN that they are distributed +across. + +```{r} +for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), margin = 2) + +pheatmap(for_plot, + color = viridis(100), display_numbers = TRUE, + number_color = "white", number_format = "%.2f") +``` + +Similarly, we can visualize the fraction of each CN made up of each cell type. + +```{r} +for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), margin = 1) + +pheatmap(for_plot, + color = viridis(100), display_numbers = TRUE, + number_color = "white", number_format = "%.2f") +``` + +This visualization can also be scaled by column to account for the relative +cell type abundance. +```{r} pheatmap(for_plot, color = colorRampPalette(c("dark blue", "white", "dark red"))(100), scale = "column") ``` -CN 1 and CN 6 are mainly composed of tumor cells with CN 6 forming the -tumor/stroma border. CN 3 is mainly composed of B and BnT cells +Lastly, we can visualize the enrichment of cell types within cellular neighborhoods +using the `regionMap` function of the `lisaClust` package. + +```{r} +library(lisaClust) +regionMap(spe, + cellType = "celltype", + region = "cn_celltypes") +``` + +CN 1 and CN 6 are mainly enriched for tumor cells with CN 6 forming the +tumor/stroma border. CN 3 is mainly enriched for B and BnT cells indicating TLS. CN 5 is composed of aggregated plasma cells and most T cells. @@ -408,15 +448,12 @@ derive numeric vectors for each cell which can then again be clustered using kmeans. All steps are supported by the `lisaClust` function which can be applied to a `SingleCellExperiment` and `SpatialExperiment` object. - In the following example, we calculate the LISA curves within a 10µm, 20µm and 50µm neighborhood around each cell. Increasing these radii will lead to broader and smoother spatial clusters. However, a number of parameter settings should be tested to estimate the robustness of the results. ```{r lisaClust, fig.height=12, fig.width=12, message=FALSE} -library(lisaClust) - set.seed(220705) spe <- lisaClust(spe, k = 6, @@ -448,15 +485,6 @@ In this case, CN 1 and 4 contain tumor cells but no CN is forming the tumor/stroma interface. CN 3 represents TLS. CN 2 indicates T cell subtypes and plasma cells are aggregated to CN 5. -As an alternative way of visualizing the enrichment of cell types within the -detected CNs, the `lisaClust` package provides the `regionMap` function. - -```{r} -regionMap(spe, - cellType = "celltype", - region = "region") -``` - ## Spatial context analysis Downstream of CN assignments, we will analyze the spatial context (SC) diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md index 883cd985..398ba5f0 100644 --- a/DEVELOPMENT.md +++ b/DEVELOPMENT.md @@ -122,6 +122,11 @@ and are hosted on Zenodo. * Continue going through the book +### Execute the book + +* When you are done working through the book, within the Docker container open the RProject file and execute `bookdown::render_book()` to make sure that it can be executed from beginning to end. +* Under `data/CellTypeValidation` have a look at the PNGs to check if celltypes were correctly detected. + ### Add changes to CHANGELOG.md Finally, add all the recent changes to the CHANGELOG, create and merge a PR and create a new release (see above). From 2e6b5c82b51011ed387cf27b2d82401904d54d2d Mon Sep 17 00:00:00 2001 From: nilseling Date: Mon, 27 Nov 2023 17:09:48 +0100 Subject: [PATCH 3/5] Updated CHANGELO --- CHANGELOG.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6199ec22..03b5c4e9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,4 +4,9 @@ **Version 1.0.1** [2023-10-19] -- Added seed before `predict` call after training a classifier \ No newline at end of file +- Added seed before `predict` call after training a classifier + +**Version 1.0.2** [2023-11-27] + +- Added developers documentation +- Added more ways to visualize cell type composition per CN \ No newline at end of file From eb429708be7712d268e37758df5433186079fde2 Mon Sep 17 00:00:00 2001 From: nilseling Date: Tue, 28 Nov 2023 11:01:11 +0100 Subject: [PATCH 4/5] Added more functions to visualize CNs on images --- 11-spatial_analysis.Rmd | 29 ++++++++++++++++++++++++++--- 1 file changed, 26 insertions(+), 3 deletions(-) diff --git a/11-spatial_analysis.Rmd b/11-spatial_analysis.Rmd index abecf9d8..033f5456 100644 --- a/11-spatial_analysis.Rmd +++ b/11-spatial_analysis.Rmd @@ -339,7 +339,7 @@ of the detected cellular neighborhoods (CN). First we can look at the total number of cells per cell type and CN. ```{r} -for_plot <- table(spe$cn_celltypes, spe$celltype) +for_plot <- table(as.character(spe$cn_celltypes), spe$celltype) pheatmap(for_plot, color = viridis(100), display_numbers = TRUE, @@ -350,7 +350,7 @@ Next, we can observe per cell type the fraction of CN that they are distributed across. ```{r} -for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), margin = 2) +for_plot <- prop.table(table(as.character(spe$cn_celltypes), spe$celltype), margin = 2) pheatmap(for_plot, color = viridis(100), display_numbers = TRUE, @@ -360,7 +360,7 @@ pheatmap(for_plot, Similarly, we can visualize the fraction of each CN made up of each cell type. ```{r} -for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), margin = 1) +for_plot <- prop.table(table(as.character(spe$cn_celltypes), spe$celltype), margin = 1) pheatmap(for_plot, color = viridis(100), display_numbers = TRUE, @@ -386,6 +386,29 @@ regionMap(spe, region = "cn_celltypes") ``` +It is also recommended to visualize some images to confirm the interpretation of +cellular neighborhoods. For this we can either use the `lisClust::hatchingPlot` or +the `imcRtools::plotSpatial` functions: + +```{r} +# hatchingPlot +cur_spe <- spe[,spe$sample_id == "Patient1_003"] +cur_sce <- as(cur_spe, "SingleCellExperiment") +cur_sce$x <- spatialCoords(cur_spe)[,1] +cur_sce$y <- spatialCoords(cur_spe)[,2] +cur_sce$region <- as.character(cur_sce$cn_celltypes) + +hatchingPlot(cur_sce, region = "cn_celltypes", cellType = "celltype") + + scale_color_manual(values = metadata(spe)$color_vectors$celltype) +``` + +```{r, fig.height=8, fig.width=10} +# plotSpatial +plotSpatial(spe[,spe$sample_id == "Patient1_003"], + img_id = "cn_celltypes", node_color_by = "celltype", node_size_fix = 0.7) + + scale_color_manual(values = metadata(spe)$color_vectors$celltype) +``` + CN 1 and CN 6 are mainly enriched for tumor cells with CN 6 forming the tumor/stroma border. CN 3 is mainly enriched for B and BnT cells indicating TLS. CN 5 is composed of aggregated plasma cells and most T From bb9b77a0650812639a8de0ea07705fec0dd72323 Mon Sep 17 00:00:00 2001 From: nilseling Date: Tue, 28 Nov 2023 12:12:29 +0100 Subject: [PATCH 5/5] Use character instead of factor --- 11-spatial_analysis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/11-spatial_analysis.Rmd b/11-spatial_analysis.Rmd index 033f5456..44a42d51 100644 --- a/11-spatial_analysis.Rmd +++ b/11-spatial_analysis.Rmd @@ -398,7 +398,7 @@ cur_sce$x <- spatialCoords(cur_spe)[,1] cur_sce$y <- spatialCoords(cur_spe)[,2] cur_sce$region <- as.character(cur_sce$cn_celltypes) -hatchingPlot(cur_sce, region = "cn_celltypes", cellType = "celltype") + +hatchingPlot(cur_sce, region = "region", cellType = "celltype") + scale_color_manual(values = metadata(spe)$color_vectors$celltype) ```