Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strand based annotations will use both reads in an overlapping read pair #5286

Merged
merged 1 commit into from
Oct 11, 2018

Conversation

takutosato
Copy link
Contributor

This PR includes two changes:

  1. Provide a command line argument to toggle the overlapping base quality correction (i.e. min(bq, 20)) before reassembly, which happens in FragmentUtils. I've found, however, that by the time SomaticGenotypingEngine runs, those the quality of these bases get bumped up to what they used to be, so this may be a no-op. I included it in case I missed something, and to be consistent with the branch @fleharty and @madduran have been using.
  2. Provide a command line argument to count the two reads in an overlapping pair separately in StrandArtfiact and StrandBiasBySample. This feature is only available in Mutect i.e. it won't affect other tools that use StrandBiasBySample

@takutosato
Copy link
Contributor Author

@davidbenjamin will you review?

@fleharty
Copy link
Contributor

fleharty commented Oct 5, 2018

🍕 🎉 🎈 !

@davidbenjamin davidbenjamin self-assigned this Oct 5, 2018
@fleharty
Copy link
Contributor

fleharty commented Oct 9, 2018

@takutosato Tests appear to be failing, can you take a look at this?

@takutosato
Copy link
Contributor Author

@fleharty fixed the bug, the tests should pass now

@fleharty
Copy link
Contributor

fleharty commented Oct 9, 2018

@takutosato One of the builds failed, are you sure that fixed the bug?

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! You can merge after addressing trivial comments.

populateMapIfEmpty(refReads);

// assume that the mate of a forward read is a reverse read and vice versa
final int numDiscardedFwdAltMates = (int) altReads.get(Strand.REV).stream().filter(ba -> ba.read.hasAttribute(StrandBiasTest.DISCARDED_MATE_READ_TAG)).count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract a method int countDiscardedMates for this logic even though it's a one-liner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And when you extract such a method you could put in logic for when there are no reads with the given strand, thereby letting you do away withpopulateMapIfEmpty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -21,6 +21,7 @@

protected static final int ARRAY_DIM = 2;
protected static final int ARRAY_SIZE = ARRAY_DIM * ARRAY_DIM;
public static final String DISCARDED_MATE_READ_TAG = "DM";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant belongs in the class that applies the tag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// This read's mate got discarded by SomaticGenotypingEngine::clipOverlappingReads()
if (read.hasAttribute(DISCARDED_MATE_READ_TAG)){
// Note that if this read is forward, then we increment its mate which we assume is reverse
table[offset + (isFW ? 1 : 0)]++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFW --> isForward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -123,4 +124,7 @@
@Argument(fullName = SMITH_WATERMAN_LONG_NAME, doc = "Which Smith-Waterman implementation to use, generally FASTEST_AVAILABLE is the right choice", optional = true)
public SmithWatermanAligner.Implementation smithWatermanImplementation = SmithWatermanAligner.Implementation.JAVA;

@Argument(fullName = CORRECT_OVERLAPPING_BASE_QUALITIES_LONG_NAME)
public boolean correctOverlappingBaseQualities = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention for boolean arguments is that specifying them yields non-default behavior.

@@ -169,4 +171,12 @@ public double getDefaultAlleleFrequency() {
@Argument(fullName = GET_AF_FROM_AD_LONG_NAME, doc="Use allelic depth to calculate tumor allele fraction; recommended for mitochondrial applications", optional = true)
public boolean calculateAFfromAD = false;

/**
* If set to true, count overlapping read pair as two seprate reads instead of one for {@link StrandArtifact} and {@link StrandBiasBySample}
* This would break the independence assumption of reads and over-count the alt depth in these annotations. On the other hand it could
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be more direct about the fact that it is, in fact, the objectively correct behavior when it comes to strand artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -101,7 +102,7 @@ public CalledHaplotypes callMutations(

for( final int loc : startPosKeySet ) {
final List<VariantContext> eventsAtThisLoc = getVariantContextsFromActiveHaplotypes(loc, haplotypes, false);
final VariantContext mergedVC = AssemblyBasedCallerUtils.makeMergedVariantContext(eventsAtThisLoc);
VariantContext mergedVC = AssemblyBasedCallerUtils.makeMergedVariantContext(eventsAtThisLoc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this remain final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can

final byte altQuality = 50;
final byte refQuality = 30;

final File samFile = File.createTempFile("liquid-biopsy", ".bam");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting-up could go in a @BeforeClass method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use @BeforeClass and ran into the issue where the @BeforeClass method takes in parameters (numAltPairs and refDepth) that I later use in Assert. Hardcoding them in the @BeforeClass method wouldn't be ideal. I could make them class-wide variables, since there are only two, but widening the scope of variables like that isn't ideal, either. So far I just extracted the sam file creation method without adding the @BeforeClass tag (in my latest commit). We can talk more tomorrow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takutosato That works for me, no need to shoehorn it into @BeforeClass.

@takutosato takutosato force-pushed the ts_read_based_annotations branch from 0dd851a to b800fc4 Compare October 11, 2018 14:49
…tions. 2. Add an option to turn off base quality correction before local reassembly
@takutosato takutosato force-pushed the ts_read_based_annotations branch from b800fc4 to d3d5f3e Compare October 11, 2018 18:55
@takutosato takutosato merged commit 4393c86 into master Oct 11, 2018
@takutosato takutosato deleted the ts_read_based_annotations branch October 11, 2018 18:56
@fleharty
Copy link
Contributor

Thanks Takuto! 🎈

EdwardDixon pushed a commit to EdwardDixon/gatk that referenced this pull request Nov 9, 2018
…tions. 2. Add an option to turn off base quality correction before local reassembly (broadinstitute#5286)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants