scATAC-Seq Analysis Pipeline

This repository provides a Snakemake pipeline for analyzing single-cell ATAC-seq (scATAC-seq) data using various tools, including cellranger-atac, Socrates, Genrich, and bedtools. The pipeline processes raw scATAC fastq files, performs barcode cleaning, demultiplexes cells, and identifies accessible chromatin regions (ACRs).

Installation

Prerequisites

Install Snakemake
Required software tools: cellranger-atac, sinto, samtools, R, popscle, bedtools, and ncbi-blast
Ensure access to SLURM for resource allocation on a high-performance computing (HPC) cluster.

Set up Conda environments

For dependencies, create Conda environments as listed in config/SocratesEnv.yaml and config/DemuxletEnv.yaml conda env create -f config/SocratesEnv.yaml conda env create -f config/DemuxletEnv.yaml

Configuration

Edit paths in config/config.yaml

for cellranger

CELLRANGER_PATH: path to cellranger-atac software
CELLRANGER_ref: path to reference created with cellranger
scATACraw: path to fastq of single-cell ATAC-seq reads
NAME: Sample name of reference for cellranger-atac
Nuclear: expression for scaffold prefix for nuclear genome for sinto usage
Plastid: expression for scaffold prefix for plastid genome
GFF: gene annotation in gff format
CHRFILE: lengths of scaffolds

for demultiplexing

MACSpath: path to macs2 software
VCFdemux: path to VCF for demuxlet
SAMPLENAMES: file with atac-seq sample names
PICARD: path to picard software

for ACR calling

GENRICH: path to genrich software
SAMPLEMETA: "Metadata/SampleNames.txt"
WGS: "/path/to/WGS_alignments"
REFERENCE: "/path/to/reference_genome.fa"
PLASTID: "/path/to/plastid_blast_db"

Pipeline Steps

Cell Ranger ATAC Processing Description: Runs cellranger-atac to process raw scATAC fastq files and generate BAM and fragment files. Input: Raw fastq files, reference genome. Output: {NAME}/outs/possorted_bam.bam, {NAME}/outs/fragments.tsv
Cell Barcode Cleaning Description: Formats and filters cell barcodes using the sinto tool. Input: BAM and fragment files from Cell Ranger ATAC output. Output: Filtered fragments and contig lists.
Cell Identification (Socrates) Description: Uses Socrates to filter cell barcodes based on genomic regions. Output: Filtered barcodes and sparse matrices for further analysis.
Demultiplexing Cells Description: Uses demuxlet with SNP data to demultiplex cells. Input: BAM file, VCF file of SNPs, sample names. Output: Best match demultiplex results, clean barcode list.
Filtering Barcodes Description: Filters barcodes using picard to remove duplicates and produce a final BAM. Output: Filtered and indexed BAM files.
Downsample Nuclei Description: Randomly downsamples barcodes to a specified count (e.g., 552 per sample). Input: List of clean barcodes. Output: Downsampled barcode files.
ACR Calling Description: Uses Genrich to call Accessible Chromatin Regions (ACRs) based on downsampled BAM files. Output: Peak files and ACR log files.
ACR Processing Description: Filters out regions aligning to plastid genomes and calculates jaccard similarity. Output: Processed and concatenated ACR files for frequency analysis.
ACR Analysis and Visualization Description: Analyzes and visualizes ACR positions relative to genes. Output: Merged and classified ACR files, visualization plots.
ACR Classification Description: Classifies ACRs based on proximity to genes and exons. Output: ACRs classified by gene location, overlap with exons, and nearest gene.

Outputs

The pipeline produces a variety of output files, including:

Filtered BAM files: Located in DEMUX/BAMscATAC/ Downsampled ACR files: Stored in DownsampledBams/Peaks/ Merged ACRs: Final classified ACR files for downstream analysis Visualization files: Plots summarizing ACRs ACRs classified by genomic context ACRs_classified/

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Metadata		Metadata
Plink		Plink
config		config
std		std
000_cellranger_atac_count.sh		000_cellranger_atac_count.sh
001_socrates_isCell.R		001_socrates_isCell.R
002_Demuxlet_eval.R		002_Demuxlet_eval.R
002_sinto_subset_barcodes.sh		002_sinto_subset_barcodes.sh
003_plot_ACR_position_overlap.R		003_plot_ACR_position_overlap.R
003c_peak_calling.sh		003c_peak_calling.sh
004_Describe_ACRs.R		004_Describe_ACRs.R
008_ACR_summary.R		008_ACR_summary.R
009_Generate_control_regions.sh		009_Generate_control_regions.sh
ACR_pos_overlap.R		ACR_pos_overlap.R
FRIP.sh		FRIP.sh
Identify_mappable_regions_PWN.sh		Identify_mappable_regions_PWN.sh
Jaccard_similarity.R		Jaccard_similarity.R
Pairwise_ACRdistance.R		Pairwise_ACRdistance.R
Pairwise_discordance.sh		Pairwise_discordance.sh
README.md		README.md
Snakefile		Snakefile
VCF_75_miss.diff-temporary.psam		VCF_75_miss.diff-temporary.psam
gene_accessibillity_scores.R		gene_accessibillity_scores.R
moreCiceroUTILS.R		moreCiceroUTILS.R
samplediff_for_demuxlet.sh		samplediff_for_demuxlet.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scATAC-Seq Analysis Pipeline

Table of Contents

Installation

Prerequisites

Set up Conda environments

Configuration

Edit paths in config/config.yaml

for cellranger

for demultiplexing

for ACR calling

Pipeline Steps

Outputs

About

Releases

Packages

Languages

IsaacDiaz026/NonCodingEvolution

Folders and files

Latest commit

History

Repository files navigation

scATAC-Seq Analysis Pipeline

Table of Contents

Installation

Prerequisites

Set up Conda environments

Configuration

Edit paths in config/config.yaml

for cellranger

for demultiplexing

for ACR calling

Pipeline Steps

Outputs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages