Overrepresentation of Reads Mapping to Intronic Regions #2229

Matthieu-Duot · 2024-10-23T14:50:01Z

Hello,

I'm working with paired-end data from different projects (different origins). I wanted to reanalyze these data using my own pipeline with STAR (version 2.7.10a). I was able to obtain correct mapping, with an average of 80% uniquely mapped reads for most of my samples. However, when I started working on these data, I noticed that in the ReadsPerGene.out.tab file, a majority of my reads (~2/3) were classified as N_noFeature. Using rnaseqc, I observed that these reads were mapped to intronic regions.

Here are some relevant statistics:

Exonic Rate: 0.358933
Intronic Rate: 0.478841
Intergenic Rate: 0.0854653
Intragenic Rate: 0.837774

Since the intergenic rate seems reasonable, I don't think there was DNA contamination in the samples. I have the impression that the issue might be related to an annotation problem. I used a genome index generated from Ensembl files, and I tried generating another index using Gencode files, but the result was the same.

Here is the STAR command I used:

Singularity> STAR --runThreadN 8 \
--genomeDir Genome_Dir/Genome_index \
--readFilesIn Test_GenDir/subset_R1_231024.fastq.gz Test_GenDir/subset_R2_231024.fastq.gz \
--readFilesCommand zcat \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--sjdbGTFfile Genome_Dir/gtf/Homo_sapiens.GRCh38.112.gtf \
--outReadsUnmapped None \
--outFileNamePrefix Test_newSTAR/InitialGenDir/InitGD \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped None \
--outBAMsortingThreadN 4 \
--quantMode GeneCounts \
--outSAMattributes NH HI NM MD AS nM jM jI XS

Do you have any idea why this might be happening?

Thanks for your help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overrepresentation of Reads Mapping to Intronic Regions #2229

Overrepresentation of Reads Mapping to Intronic Regions #2229

Matthieu-Duot commented Oct 23, 2024

Overrepresentation of Reads Mapping to Intronic Regions #2229

Overrepresentation of Reads Mapping to Intronic Regions #2229

Comments

Matthieu-Duot commented Oct 23, 2024