Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overrepresentation of Reads Mapping to Intronic Regions #2229

Open
Matthieu-Duot opened this issue Oct 23, 2024 · 0 comments
Open

Overrepresentation of Reads Mapping to Intronic Regions #2229

Matthieu-Duot opened this issue Oct 23, 2024 · 0 comments

Comments

@Matthieu-Duot
Copy link

Hello,

I'm working with paired-end data from different projects (different origins). I wanted to reanalyze these data using my own pipeline with STAR (version 2.7.10a). I was able to obtain correct mapping, with an average of 80% uniquely mapped reads for most of my samples. However, when I started working on these data, I noticed that in the ReadsPerGene.out.tab file, a majority of my reads (~2/3) were classified as N_noFeature. Using rnaseqc, I observed that these reads were mapped to intronic regions.

Here are some relevant statistics:

Exonic Rate: 0.358933
Intronic Rate: 0.478841
Intergenic Rate: 0.0854653
Intragenic Rate: 0.837774

Since the intergenic rate seems reasonable, I don't think there was DNA contamination in the samples. I have the impression that the issue might be related to an annotation problem. I used a genome index generated from Ensembl files, and I tried generating another index using Gencode files, but the result was the same.

Here is the STAR command I used:

Singularity> STAR --runThreadN 8 \
--genomeDir Genome_Dir/Genome_index \
--readFilesIn Test_GenDir/subset_R1_231024.fastq.gz Test_GenDir/subset_R2_231024.fastq.gz \
--readFilesCommand zcat \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--outFilterMismatchNmax 999 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--sjdbGTFfile Genome_Dir/gtf/Homo_sapiens.GRCh38.112.gtf \
--outReadsUnmapped None \
--outFileNamePrefix Test_newSTAR/InitialGenDir/InitGD \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped None \
--outBAMsortingThreadN 4 \
--quantMode GeneCounts \
--outSAMattributes NH HI NM MD AS nM jM jI XS

Do you have any idea why this might be happening?

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant