Merge pull request #562 from AlexsLemonade/jashapiro/set-up-advanced-…

…setup Add structure for advanced single cell module
AlexsLemonade · Oct 21, 2022 · 2784535 · 2784535
2 parents 082bf48 + faed962
commit 2784535
Show file tree

Hide file tree

Showing 13 changed files with 63 additions and 20 deletions.
diff --git a/.github/workflows/spell-check.yml b/.github/workflows/spell-check.yml
@@ -14,24 +14,24 @@ jobs:
   spell-check:
     runs-on: ubuntu-latest
     container:
-      image: rocker/tidyverse:4.0.3
+      image: rocker/tidyverse:4.1.2
 
     # Steps represent a sequence of tasks that will be executed as part of the job
     steps:
       - uses: actions/checkout@v3
 
       - name: Install packages
-        run: Rscript --vanilla -e "install.packages('spelling', repos = c(CRAN = 'https://cloud.r-project.org'))"
+        run: Rscript --vanilla -e "install.packages(c('spelling'), repos = c(CRAN = '$CRAN'))"
 
       - name: Run spell check
         id: spell_check_run
         run: |
           results=$(Rscript --vanilla "scripts/spell-check.R")
-          echo "::set-output name=sp_chk_results::$results"
+          echo "sp_chk_results=$results" >> $GITHUB_OUTPUT
           cat spell_check_errors.tsv
 
       - name: Archive spelling errors
-        uses: actions/upload-artifact@v2
+        uses: actions/upload-artifact@v3
         with:
           name: spell-check-results
           path: spell_check_errors.tsv

diff --git a/.gitignore b/.gitignore
@@ -1,9 +1,3 @@
-# exercise notebook htmls
-RNA-seq/07-bulk_rnaseq_exercise.nb.html
-intro-to-R-tidyverse/04a-intro_to_R_exercise.nb.html
-intro-to-R-tidyverse/04b-intro_to_tidyverse_exercise-part-1.nb.html
-intro-to-R-tidyverse/04c-intro_to_tidyverse_exercise-part-2.nb.html
-
 #DS_Stores
 .DS_Store
 
@@ -20,3 +14,7 @@ spell_check_errors.tsv
 
 # Snakemake files
 .snakemake
+
+# ignore live and exercise nb.html files
+*-live.nb.html
+exercise_*.nb.html
diff --git a/RNA-seq/.gitignore b/RNA-seq/.gitignore
@@ -10,5 +10,3 @@ results/
 # development helper notebook for generating the NB cell line tximport object
 nb_cell_line_tximport*
 
-# live versions notebook HTML
-*-live.nb.html
diff --git a/components/dictionary.txt b/components/dictionary.txt
@@ -318,6 +318,7 @@ symlinked
 symlinks
 Tabula
 Tamayo
+TBD
 tetrachloride
 tgMap
 Theis

diff --git a/intro-to-R-tidyverse/.gitignore b/intro-to-R-tidyverse/.gitignore
@@ -2,6 +2,3 @@
 plots/
 results/
 
-
-# live versions notebook HTML
-*-live.nb.html
diff --git a/machine-learning/.gitignore b/machine-learning/.gitignore
@@ -6,3 +6,4 @@ data/
 plots/
 results/
 models/
+
diff --git a/scRNA-seq-advanced/.gitignore b/scRNA-seq-advanced/.gitignore
@@ -0,0 +1,4 @@
+# ignore notebook results files/folders
+analysis
+
+
diff --git a/scRNA-seq-advanced/README.md b/scRNA-seq-advanced/README.md
@@ -0,0 +1,15 @@
+# Advanced scRNA-seq Training Module
+
+This Childhood Cancer Data Lab-designed module covers more advanced topics in the analysis of single-cell RNA-seq data.
+
+The module builds on material in the [scRNA-seq  module](https://github.com/AlexsLemonade/training-modules/tree/master/scRNA-seq), and analyses are designed to be performed within a [Docker container](https://github.com/AlexsLemonade/training-modules/tree/master/docker-install) or on the Data Lab RStudio server.
+It covers cell-type identification, integration of multiple single-cell RNA-seq libraries, and differential expression analyses, among other related topics.
+
+The notebooks that comprise this module are:
+
+* TBD
+
+
+Additional exercise notebooks:
+
+* TBD
diff --git a/scRNA-seq-advanced/data/.gitkeep b/scRNA-seq-advanced/data/.gitkeep
diff --git a/scRNA-seq-advanced/scRNA-seq-advanced.Rproj b/scRNA-seq-advanced/scRNA-seq-advanced.Rproj
@@ -0,0 +1,16 @@
+Version: 1.0
+
+RestoreWorkspace: No
+SaveWorkspace: No
+AlwaysSaveHistory: No
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: Sweave
+LaTeX: pdfLaTeX
+
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
diff --git a/scRNA-seq-advanced/setup/README.md b/scRNA-seq-advanced/setup/README.md
@@ -0,0 +1,17 @@
+# Advanced Single Cell RNA-seq training data setup
+
+This document describes how the training data is prepared for the advanced single cell RNA-seq training data on the RStudio Server.
+
+As new training data is added, notebooks, scripts, and/or workflows should be added to this directory to describe the steps required to recreate the required training input files. 
+These files should include source locations for downloading raw data as appropriate, and all processing steps required to prepare data for use.
+
+## File locations
+
+On the RStudio server the main location for the files needed for this training module will be `/shared/data/training-modules/scRNA-seq-advanced/`.
+The files are then organized by dataset.
+
+After setup, the symlinks should be established from `/shared/data/training-modules/scRNA-seq-advanced/` to `training-modules/scRNA-seq-advanced/data` as appropriate.
+These links will usually be created using the script at `training-modules/scripts/link-data.sh`, so directories and files should be added to the `link_locs` array in that script.
+If no data will be written to a data directory, linking to that directory will be sufficient, but in some cases links to individual files may be required.
+
+
diff --git a/scRNA-seq/.gitignore b/scRNA-seq/.gitignore
@@ -30,7 +30,3 @@ data/tabula-muris/alevin-quant
 data/tabula-muris/filtered
 data/tabula-muris/normalized/*
 !data/tabula-muris/normalized/.gitkeep
-
-# ignore live and exercise nb.html files
-*-live.nb.html
-*_exercise.nb.html
diff --git a/scRNA-seq/04-dimension_reduction_scRNA.Rmd b/scRNA-seq/04-dimension_reduction_scRNA.Rmd
@@ -345,7 +345,7 @@ plotReducedDim(normalized_sce, "PCA", ncomponents = c(3,4))
 
 **UMAP** (Uniform Manifold Approximation and Projection) is a machine learning technique designed to provide more detail in highly dimensional data than a typical principal components analysis. 
 While PCA assumes that the variation we care about has a particular distribution (normal, broadly speaking), UMAP allows more complicated distributions that it learns from the data. 
-The underlying mathematics are beyond me, but if you are more ambitious than I, you can look at the paper by [McInnes, Healy, & Melville (2018)](https://arxiv.org/abs/1802.03426). 
+The underlying mathematics are beyond me, but if you are more ambitious than I, you can look at the paper by [McInnes, Healy, & Melville (2018)](https://arxiv.org/abs/1802.03426).
 The main advantage of this change in underlying assumptions is that UMAP can do a better job separating clusters, especially when some of those clusters may be more similar to each other than others.  
 
 Another dimensionality reduction technique that you may have heard of is **t-SNE** (t-distributed Stochastic Neighbor Embedding), which has similar properties to UMAP, and often produces similar results.