Skip to content

Commit

Permalink
Merge pull request #562 from AlexsLemonade/jashapiro/set-up-advanced-…
Browse files Browse the repository at this point in the history
…setup

Add structure for advanced single cell module
  • Loading branch information
jashapiro authored Oct 21, 2022
2 parents 082bf48 + faed962 commit 2784535
Show file tree
Hide file tree
Showing 13 changed files with 63 additions and 20 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/spell-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,24 +14,24 @@ jobs:
spell-check:
runs-on: ubuntu-latest
container:
image: rocker/tidyverse:4.0.3
image: rocker/tidyverse:4.1.2

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
- uses: actions/checkout@v3

- name: Install packages
run: Rscript --vanilla -e "install.packages('spelling', repos = c(CRAN = 'https://cloud.r-project.org'))"
run: Rscript --vanilla -e "install.packages(c('spelling'), repos = c(CRAN = '$CRAN'))"

- name: Run spell check
id: spell_check_run
run: |
results=$(Rscript --vanilla "scripts/spell-check.R")
echo "::set-output name=sp_chk_results::$results"
echo "sp_chk_results=$results" >> $GITHUB_OUTPUT
cat spell_check_errors.tsv
- name: Archive spelling errors
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: spell-check-results
path: spell_check_errors.tsv
Expand Down
10 changes: 4 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
# exercise notebook htmls
RNA-seq/07-bulk_rnaseq_exercise.nb.html
intro-to-R-tidyverse/04a-intro_to_R_exercise.nb.html
intro-to-R-tidyverse/04b-intro_to_tidyverse_exercise-part-1.nb.html
intro-to-R-tidyverse/04c-intro_to_tidyverse_exercise-part-2.nb.html

#DS_Stores
.DS_Store

Expand All @@ -20,3 +14,7 @@ spell_check_errors.tsv

# Snakemake files
.snakemake

# ignore live and exercise nb.html files
*-live.nb.html
exercise_*.nb.html
2 changes: 0 additions & 2 deletions RNA-seq/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,3 @@ results/
# development helper notebook for generating the NB cell line tximport object
nb_cell_line_tximport*

# live versions notebook HTML
*-live.nb.html
1 change: 1 addition & 0 deletions components/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,7 @@ symlinked
symlinks
Tabula
Tamayo
TBD
tetrachloride
tgMap
Theis
Expand Down
3 changes: 0 additions & 3 deletions intro-to-R-tidyverse/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,3 @@
plots/
results/


# live versions notebook HTML
*-live.nb.html
1 change: 1 addition & 0 deletions machine-learning/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ data/
plots/
results/
models/

4 changes: 4 additions & 0 deletions scRNA-seq-advanced/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# ignore notebook results files/folders
analysis


15 changes: 15 additions & 0 deletions scRNA-seq-advanced/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Advanced scRNA-seq Training Module

This Childhood Cancer Data Lab-designed module covers more advanced topics in the analysis of single-cell RNA-seq data.

The module builds on material in the [scRNA-seq module](https://github.com/AlexsLemonade/training-modules/tree/master/scRNA-seq), and analyses are designed to be performed within a [Docker container](https://github.com/AlexsLemonade/training-modules/tree/master/docker-install) or on the Data Lab RStudio server.
It covers cell-type identification, integration of multiple single-cell RNA-seq libraries, and differential expression analyses, among other related topics.

The notebooks that comprise this module are:

* TBD


Additional exercise notebooks:

* TBD
Empty file.
16 changes: 16 additions & 0 deletions scRNA-seq-advanced/scRNA-seq-advanced.Rproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Version: 1.0

RestoreWorkspace: No
SaveWorkspace: No
AlwaysSaveHistory: No

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes
17 changes: 17 additions & 0 deletions scRNA-seq-advanced/setup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Advanced Single Cell RNA-seq training data setup

This document describes how the training data is prepared for the advanced single cell RNA-seq training data on the RStudio Server.

As new training data is added, notebooks, scripts, and/or workflows should be added to this directory to describe the steps required to recreate the required training input files.
These files should include source locations for downloading raw data as appropriate, and all processing steps required to prepare data for use.

## File locations

On the RStudio server the main location for the files needed for this training module will be `/shared/data/training-modules/scRNA-seq-advanced/`.
The files are then organized by dataset.

After setup, the symlinks should be established from `/shared/data/training-modules/scRNA-seq-advanced/` to `training-modules/scRNA-seq-advanced/data` as appropriate.
These links will usually be created using the script at `training-modules/scripts/link-data.sh`, so directories and files should be added to the `link_locs` array in that script.
If no data will be written to a data directory, linking to that directory will be sufficient, but in some cases links to individual files may be required.


4 changes: 0 additions & 4 deletions scRNA-seq/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,3 @@ data/tabula-muris/alevin-quant
data/tabula-muris/filtered
data/tabula-muris/normalized/*
!data/tabula-muris/normalized/.gitkeep

# ignore live and exercise nb.html files
*-live.nb.html
*_exercise.nb.html
2 changes: 1 addition & 1 deletion scRNA-seq/04-dimension_reduction_scRNA.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ plotReducedDim(normalized_sce, "PCA", ncomponents = c(3,4))

**UMAP** (Uniform Manifold Approximation and Projection) is a machine learning technique designed to provide more detail in highly dimensional data than a typical principal components analysis.
While PCA assumes that the variation we care about has a particular distribution (normal, broadly speaking), UMAP allows more complicated distributions that it learns from the data.
The underlying mathematics are beyond me, but if you are more ambitious than I, you can look at the paper by [McInnes, Healy, & Melville (2018)](https://arxiv.org/abs/1802.03426).
The underlying mathematics are beyond me, but if you are more ambitious than I, you can look at the paper by [McInnes, Healy, & Melville (2018)](https://arxiv.org/abs/1802.03426).
The main advantage of this change in underlying assumptions is that UMAP can do a better job separating clusters, especially when some of those clusters may be more similar to each other than others.

Another dimensionality reduction technique that you may have heard of is **t-SNE** (t-distributed Stochastic Neighbor Embedding), which has similar properties to UMAP, and often produces similar results.
Expand Down

0 comments on commit 2784535

Please sign in to comment.