Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why in gencode.comprehensive.splice.junctions.txt junction start is in the intron and junction end is 2bp in exon? #1

Open
naumenko-sa opened this issue Jun 7, 2018 · 3 comments

Comments

@naumenko-sa
Copy link
Owner

Look at any junction, i.e.
15 42698142 42700410 CAPN3 ENST00000569136.1 protein_coding
15:42698142 is the first base of the intron
15:42700410 is the second base of the exon

How does it influence annotations, given that we use flank=1

@ekofman
Copy link

ekofman commented Sep 5, 2018

Yes, I noticed this as well. This is making it such that some splice junctions are appearing as "novel" though there are in fact off-by-one-or-two discrepancies. @naumenko-sa Did you try making your own version of gencode.comprehensive.splice.junctions.txt to address this issue?

@naumenko-sa
Copy link
Owner Author

Hi @ekofman . We have the same issue. Some junctions appear as novel (not annotated by gencode). I think the easiest way to remove some false positives is to use flank=2 in AddJunctionsToDatabase1.py

@ekofman
Copy link

ekofman commented Sep 17, 2018

Yes @naumenko-sa I was able to fix the problem by just being careful about the indexing (for CIGAR strings start index is inclusive but end index is not inclusive), and ensuring that I was simply subtracting two from the junction end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants