Skip to content

Seurat R/Bioconductor scRNA-seq analysis of non-small-cell lung cancer (NSCLC) cells compared to normal lung cells

Notifications You must be signed in to change notification settings

felixm3/scRNA-seq

Repository files navigation

Analysis of scRNA-seq data comparing non-small-cell lung cancer (NSCLC) cells to normal cells

You may need to refresh a couple of times if you see the message "Unable to render code block" when loading the (pdf of the) Jupyter notebook above

I wrote these R scripts to analyze a large, single-cell RNA sequencing (scRNA-seq) dataset (~900,000 cells) using R and various Bioconductor bioinformatics packages.

Overall Functionality:

The scripts accomplish the following tasks:

  1. Load necessary R packages for analysis (Seurat, DESeq2, pheatmap, ggplot2, etc.).
  2. Read scRNA-seq data from an h5 file and converts it into a Seurat object.
  3. Conduct quality control (QC) analysis, filtering out low-quality cells based on gene counts, mitochondrial content, etc.
  4. Generate QC plots to visualize the distribution of various QC metrics.
  5. Perform normalization, variable gene selection, scaling, dimension reduction (UMAP), clustering, and visualization using Seurat's workflows.
  6. Use Azimuth for cell annotation based on the Human Lung Cell Atlas.
  7. Conduct pseudobulk analysis, differential expression analysis (DESeq2), and visualization of significant genes.
  8. Print session information, clears workspace, and manages parallelization using the future package.

Input Files:

  • Input: Single-cell RNA sequencing data in an h5 file (16plex_900k_32_NSCLC_multiplex_count_filtered_feature_bc_matrix.h5).
  • Additional CSV files for sample information and annotation.

Required Packages and Tools:

  • R Packages: Seurat, SeuratDisk, Azimuth, DESeq2, pheatmap, ggplot2, EnhancedVolcano, BPCells, future, dplyr.

Outputs:

  • Seurat objects at different analysis stages.
  • QC plots (Violin plots, heatmaps).
  • Intermediate data files for normalization, clustering, and differential expression analysis.
  • Visualizations (UMAP plots, volcano plots) highlighting various aspects of the data.

The dataset is from 10X Genomics.

About

Seurat R/Bioconductor scRNA-seq analysis of non-small-cell lung cancer (NSCLC) cells compared to normal lung cells

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages