-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort output from SVClusterEngine and fix no-call genotype ploidy bug in JointGermlineCNVSegmentation #7779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions and requests for comments, but otherwise LGTM!
.collect(Collectors.toList()); | ||
} | ||
|
||
public List<T> forceFlush() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is for special cases, like the tests below, and the task end. Is that true? Regardless, please add some javadoc, esp. for differentiating use cases with the above method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok added a comment. This is to be used only by forceFlush() in the engine itself, which is only called when we're certain that none of the currently active clusters can change. This is yes usually when reaching the end of a contig (or file).
//variantContexts should have identical start, so choose 0th arbitrarily | ||
final String variantContig = variantContexts.get(0).getContig(); | ||
if (currentContig != null && !variantContig.equals(currentContig)) { | ||
// Since we need to check for variant overlap and reset genotypes, only flush clustering when we hit a new contig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this comment should go in processClusters instead since that's where the logic about contigs is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisiting this, I noticed how we only call this at the end of contigs anyway so the force
parameter is unnecessary (just always force).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also moved comment to a docstring in processClusters()
@@ -682,14 +680,18 @@ private static Genotype prepareGenotype(final Genotype g, final Allele refAllele | |||
|
|||
private static void correctGenotypePloidy(final GenotypeBuilder builder, final Genotype g, final int ploidy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I dropped the ball, but can you add some comments here? This is for overlapping events, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This just modifies the genotypes for input variants so the ploidies are consistent with the ped file and also irons out any no-call/null GTs.
|
||
// Check for one record | ||
int expectedRecordsFound = 0; | ||
for (final VariantContext variant : records) { | ||
Assert.assertTrue(variant.hasAttribute(GATKSVVCFConstants.CLUSTER_MEMBER_IDS_KEY)); | ||
Assert.assertTrue(variant.hasAttribute(GATKSVVCFConstants.ALGORITHMS_ATTRIBUTE)); | ||
if (variant.getID().equals("SVx000001ad")) { | ||
if (variant.getContig().equals("chr20") && variant.getStart() == 28654436) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is better.
@@ -56,7 +56,7 @@ public void testDefragmentation() { | |||
Assert.assertEquals(header.getSampleNamesInOrder(), inputHeader.getSampleNamesInOrder()); | |||
Assert.assertEquals(header.getSequenceDictionary().size(), inputHeader.getSequenceDictionary().size()); | |||
|
|||
Assert.assertEquals(records.size(), 338); | |||
Assert.assertEquals(records.size(), 408); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What caused this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, this is a bug in master. Due to the use of TreeSet
with a comparator on genomic position, unclustered variants with the same start position were getting thrown out.
Codecov Report
@@ Coverage Diff @@
## master #7779 +/- ##
================================================
+ Coverage 18.644% 86.946% +68.302%
- Complexity 4635 36933 +32298
================================================
Files 1261 2219 +958
Lines 73745 173666 +99921
Branches 11768 18754 +6986
================================================
+ Hits 13749 150995 +137246
+ Misses 57944 16055 -41889
- Partials 2052 6616 +4564
|
OutputSortingBuffer
class used bySVCluster
intoSVClusterEngine
to unify clustering code across toolsJointGermlineCNVSegmentation