Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several bugs involving getReadCoordinateForReferenceCoordinate #6485

Merged
merged 4 commits into from
Mar 10, 2020

Conversation

davidbenjamin
Copy link
Contributor

Closes #6342.
Closes #6314.
Closes #6294.
Closes #5492.

@takutosato Fixing bugs and simplifying code.

@gatk-bot
Copy link

gatk-bot commented Mar 5, 2020

Travis reported job failures from build 29404
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling openjdk8 29404.4 logs

@gatk-bot
Copy link

gatk-bot commented Mar 5, 2020

Travis reported job failures from build 29408
Failures in the following jobs:

Test Type JDK Job ID Logs
variantcalling openjdk8 29408.4 logs

@jamesemery
Copy link
Collaborator

Since you are working on this branch right as I have noticed a bug in this code myself. I have one but to pile onto this one. I have discovered that if you run getReadCoordinateForReferenceCoordinate() with allowGoalNotReached==true and if you request exactly the base one position after the end of the read window you will have it return that base even if the read doesn't actually have that many elements in its base array (for example read spanning from 1000-1019 with cigar 20M requesting position 1020 will return a pair of <21, false> instead of <-1,false>. It doesn't look like this is one of the bugs that you described but if you are fixing this code It would be nice to fix it as well.

@davidbenjamin
Copy link
Contributor Author

@jamesemery This PR fixes that.

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly requests to rename variables

final byte[] myBases = getBases();

// can't insert if we don't have any sequence after the inserted alt allele to span the new variant
if (haplotypeInsertLocation + refAllele.length() >= myBases.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this checking that there are non-variant bases (i.e. bases that are not in the variant context) on either side of the inserted allele?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ref allele has n bases, you can think of the alt allele as deleting n - 1 bases (eg ACT -> A) followed, optionally by replacing those n - 1 bases (in the case of a MNP eg ACT -> A -> AGG). Either way, if the haplotype doesn't contain those n - 1 bases it doesn't make sense to perform this transformation.

That is, suppose our haplotype is . . .ATTG. Then we can insert an ATT -> A deletion by removing the TT, but we can't insert an ATTGC -> A deletion because the C is not there.

@davidbenjamin davidbenjamin merged commit 1251155 into master Mar 10, 2020
@davidbenjamin davidbenjamin deleted the db_read_position branch March 10, 2020 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment