-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transcripts missing length information exception #3
Comments
Hi Travis Thanks for using EMASE and contacting. I think you right in guessing that the individualized pooled transcriptome is missing genes/isoforms/alleles that is present in the reference GTF file. This typically happens when insertions/deletions removes them. (Just curious to know how you created the pooled transcriptome adjusting for SNPs and Indels.) A hack to get around the issue is to try manually adding the missing genes/isoforms/alleles and its "fake" lengths and try running EMASE. Since the pooled transcriptome does not have the sequences for these missing genes/isoforms/alleles, the quantitation would not really get affected. Thank you |
We are really just following the first "real use case" example at this point. I'm still a bit fuzzy on the details myself since I just stepped in to help a colleague who got stuck. If transcripts with a length of 0 won't effect the results, then maybe a work-around is just to make the exception a warning instead. I'll play around with that a bit. |
So an apparent hackish fix is to just replace the
I only understand what Since the bug is still outstanding in the code, I'll let the maintainers close |
I'm triggering
RuntimeError('There exist transcripts missing length information.')
We are following the example: https://emase.readthedocs.org/en/latest/examples.html#estimating-allele-specific-expression-from-human-rna-seq-data
A probably important note is that we're working with a significantly rougher reference (A. gambiae) than human, mouse, ect. Samples are all the same species using the same ref and annotation. Anyway, my guess is that some wonkiness in the reference gtf is causing the problem.
Assuming my arithmetic is correct, I've traced it to the fact that there are some transcripts listed in
emase.gene2transcripts.tsv
which are not inemase.pooled.transcripts.info
.It appears that 37 transcripts are being lost someplace between generating
emase.transcripts.info
andemase.pooled.transcripts.info
.31238/2 = 15619 ... If I understand correctly, that should equal the total number of transcripts (15656). I could be wrong of course.
The text was updated successfully, but these errors were encountered: