This project investigates the role of ZNF335 in regulating cellular cholesterol metabolism and statin response. Liver samples from CRISPRi/CRISPRa-modified mice were analyzed using RNA sequencing (RNA-seq) to identify differentially expressed genes and key pathways involved in lipid metabolism, focusing on the effects of Zfp335 and its potential impact on high cholesterol and statin efficacy.
- Analyze the role of Zfp335 in cholesterol metabolism using CRISPRi/CRISPRa-modified mice.
- Identify differentially expressed genes involved in cholesterol and lipid metabolism.
- Conduct functional enrichment analysis to explore regulatory networks related to Zfp335.
- Assess Zfp335’s potential impact on statin response.
You can install the required dependencies in two ways: using a bash script or a Docker container.
Download and run the install_tools.sh script:
bash install_tools.sh
Alternatively, build and run a Docker container with all dependencies:
- Build the Docker Image:
docker build -t rnaseq_tools .
- Run the Docker Container:
docker run -it --rm -v /path/to/your/data:/data rnaseq_tools
The function of this step is to assess the quality of raw sequencing reads to identify any potential issues with the data before further analysis.
-
Input: Raw FASTQ files are used as input to check for quality.
-
Output: FastQC quality reports are generated, detailing per-base quality scores and other key metrics.
-
Script: scriptFastQC
This step removes low-quality bases and adapter sequences from the raw reads, ensuring high-quality data for downstream analysis.
-
Input: Raw FASTQ files are provided as input to be trimmed and filtered for quality.
-
Output: The output consists of trimmed FASTQ files, with adapters and low-quality sequences removed.
-
Script: scriptTrimmomatic
This step generates a genome index to prepare the reference genome for efficient read alignment in later steps.
-
Input: The input includes the mouse genome FASTA file and GTF annotation file.
-
Output: STAR genome index files are created, which are used for aligning RNA-seq reads.
-
Script: scriptSTARIndex
This step aligns the RNA-seq reads to the mouse genome to generate preliminary BAM files and identify splice junctions.
-
Input: Trimmed FASTQ files are used as input to perform the first pass of read alignment.
-
Output: The output is BAM files containing aligned reads, along with identified splice junctions.
-
Script: scriptSTARPass1
This step refines the alignment process by using the splice junctions identified in the first pass to improve the accuracy of the alignments.
-
Input: The input includes BAM files from the first pass, along with splice junction data.
-
Output: Final aligned BAM files are produced, with improved read alignment around splice sites.
-
Script: scriptSTARPass2
This step evaluates the mapping quality of the RNA-seq data by calculating percentages of mapped, unmapped, and multi-mapped reads.
-
Input: STAR log files from the alignment process are provided as input to analyze mappability statistics.
-
Output: Mappability statistics are generated, along with a pie chart showing the proportion of mapped, unmapped, and multi-mapped reads.
-
Script: alignment_analysis.py
This step involves processing BAM files by indexing them, marking duplicate reads, and merging multiple BAM files into combined datasets.
-
Input: The input includes the BAM files generated from the second pass alignment.
-
Output: Indexed, duplicate-marked, and merged BAM files are produced, along with duplicate metrics plots for quality assessment.
-
Scripts: scriptIndexBam, scriptMarkDuplicates, scriptMergeBam, duplicate_metrics_plot.py
This step calculates and visualizes the read coverage across the genome to assess the evenness and depth of sequencing coverage.
-
Input: BAM files from the alignment process are used as input to calculate read coverage.
-
Output: A read coverage plot is generated, and raw counts of coverage across the genome are output for further analysis.
-
Scripts: scriptPlotCoverage, calculate_coverage.py
This step quantifies the number of reads mapped to each gene, providing a gene-level count matrix for differential expression analysis.
-
Input: BAM files are provided as input to count the number of reads mapped to genes.
-
Output: A gene count matrix is produced, detailing the number of reads aligned to each gene.
-
Script: scriptFeatureCounts
This step identifies genes that are differentially expressed between experimental conditions, such as wild-type vs. CRISPR-modified samples.
-
Input: The input includes the gene count matrix and sample information.
-
Output: Differentially expressed genes are identified, and visualizations such as heatmaps and PCA plots are generated to explore the data.
-
Scripts: scriptDESeq2, scriptDESeq2.R
This step performs functional enrichment analysis to identify biological pathways and functions that are significantly associated with the differentially expressed genes.
-
Input: A list of differentially expressed genes is used as input for functional enrichment.
-
Output: The output includes a list of enriched biological functions or pathways based on the gene list.
-
Scripts: scriptEnrichR, scriptEnrichR.R
Once you have Nextflow installed and configured to use Docker, you can execute the entire workflow with:
nextflow run main.nf -with-docker
This command will run all steps of the pipeline automatically, leveraging the Docker container for a consistent environment across all processes.
You can also manually submit individual scripts to an SGE-managed cluster. For example, to submit the FastQC step, use the following:
qsub scriptFastQC.sh
Repeat this for each step in the pipeline as required by your cluster environment.
By following the steps outlined in this README, you will be able to replicate the RNA-seq analysis workflow to study the role of ZNF335 in cholesterol metabolism and statin response. The workflow includes quality control, alignment, differential expression analysis, and functional enrichment.