Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated MarkDuplicates to use Picard metrics code #4779

Merged
merged 8 commits into from
May 22, 2018
Merged

Conversation

jamesemery
Copy link
Collaborator

Fixes #4777

* Metrics that are calculated during the process of marking duplicates
* within a stream of SAMRecords.
*/
@SuppressWarnings("serial")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of suppressing the serial warning we should add the serialVersionUID. We've had problems in the past due to not including it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public void updateMetrics(SAMRecord rec) {
update(rec.getReadUnmappedFlag(), rec.isSecondaryOrSupplementary(), ReadUtils.readHasMappedMate(rec), rec.getDuplicateReadFlag() );
}
public void updateMetrics(GATKRead read) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a linebreak

@@ -6,7 +6,7 @@

import java.io.*;

/** Coded for ReadEnds that just outputs the primitive fields and reads them back. */
/** Codec for ReadEnds that just outputs the primitive fields and reads them back. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -235,28 +237,40 @@ public void testTwoMappedPairsAndMappedSecondaryFragment() {
tester.addMappedPair(1, 1, 100, false, false, ELIGIBLE_BASE_QUALITY);
tester.addMappedPair(1, 1, 100, true, true, DEFAULT_BASE_QUALITY); // duplicate!!!
tester.addMappedFragment(1, 200, false, DEFAULT_BASE_QUALITY, true); // mapped non-primary fragment
if (this instanceof MarkDuplicatesSparkIntegrationTest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like these instance of checks . Maybe instead of the instanceof, we can add a method to AbstractMarkDuplicatesTester.doNotMarkUnmappedMates() that will be overriden by the the spark tester to provide the commandline argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an overridden method on AbstractMarkDuplicatesTester to handle it

tester.addMappedFragment(1, 1, false, ELIGIBLE_BASE_QUALITY);
tester.addMatePair(1, 200, 0, false, true, false, false, "54M22S", null, false, false, true, true, false, DEFAULT_BASE_QUALITY);
if (this instanceof MarkDuplicatesSparkIntegrationTest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery A few nitpicks with my own code...

@lbergelson lbergelson assigned jamesemery and unassigned lbergelson May 17, 2018
@jamesemery jamesemery assigned lbergelson and unassigned jamesemery May 18, 2018
@jamesemery
Copy link
Collaborator Author

@lbergelson Responded to your comments

@@ -42,6 +43,11 @@ protected CommandLineProgram getCommandLineProgramInstance() {
return new MarkDuplicatesSpark();
}

@Override
protected String[] getExtraArguments() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this, lets just set this always in the tester

@codecov-io
Copy link

codecov-io commented May 18, 2018

Codecov Report

Merging #4779 into master will increase coverage by 0.095%.
The diff coverage is 87.023%.

@@              Coverage Diff               @@
##             master     #4779       +/-   ##
==============================================
+ Coverage     80.13%   80.225%   +0.095%     
- Complexity    17438     17742      +304     
==============================================
  Files          1082      1085        +3     
  Lines         63150     63834      +684     
  Branches      10187     10353      +166     
==============================================
+ Hits          50602     51211      +609     
- Misses         8564      8606       +42     
- Partials       3984      4017       +33
Impacted Files Coverage Δ Complexity Δ
...markduplicates/ReadEndsForMarkDuplicatesCodec.java 72.34% <ø> (ø) 7 <0> (ø) ⬇️
.../markduplicates/EstimateLibraryComplexityGATK.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...transforms/markduplicates/MarkDuplicatesSpark.java 98.077% <100%> (+0.16%) 19 <11> (+2) ⬆️
...ates/AbstractMarkDuplicatesCommandLineProgram.java 93.333% <100%> (ø) 24 <5> (ø) ⬇️
...ls/read/markduplicates/GATKDuplicationMetrics.java 100% <100%> (ø) 13 <13> (?)
...forms/markduplicates/MarkDuplicatesSparkUtils.java 89.498% <100%> (-0.29%) 62 <4> (-8)
...roadinstitute/hellbender/utils/read/ReadUtils.java 81.87% <100%> (+2.652%) 341 <9> (+152) ⬆️
.../utils/read/markduplicates/LibraryIdGenerator.java 89.474% <100%> (-4.971%) 14 <3> (ø)
...ols/walkers/markduplicates/MarkDuplicatesGATK.java 87.917% <69.231%> (-2.003%) 69 <0> (-7)
...e/hellbender/utils/test/testers/SamFileTester.java 85.271% <79.167%> (-5.45%) 35 <8> (+3)
... and 21 more

lbergelson and others added 7 commits May 21, 2018 10:05
moving AbstractMarkDuplicateTester to test sources

resolves #3762

updating tests to include new tests and changes

tests not passing, need to update old gatk test files
changed from hellbender...DuplicateMetrics -> hellbender...GATKDuplicateMetrics which extends picard...DuplicateMetrics

porting bug fixes to LibraryIdGenerator from picard

updating several metrics files with the newest version of picard

the lbiraries in the metrics files are then sorted in alphabetical order

changes, saving state

add addtional test fixes
@jamesemery
Copy link
Collaborator Author

@lbergelson Are these changes good for you? I would like to get these changes in soonish.

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jamesemery jamesemery assigned mwalker174 and fleharty and unassigned mwalker174 May 21, 2018
@lbergelson
Copy link
Member

@fleharty Did you have any review comments?

@fleharty
Copy link
Contributor

fleharty commented May 22, 2018 via email

Copy link
Contributor

@fleharty fleharty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Mostly nitpicks here. I approved since this is just really putting in finals and modifying whitespace.

@@ -1,5 +1,6 @@
package org.broadinstitute.hellbender.tools.spark.transforms.markduplicates;

import com.google.common.collect.Iterators;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import appears to not be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize it isn't your code, but can you remove the other two rouge imports....
line 21
import org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource;

and
import picard.sam.markduplicates.util.OpticalDuplicateFinder;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return Utils.stream(readsIter).peek(read -> {
return Utils.stream(readsIter)
.peek(read -> read.setIsDuplicate(false))
.peek(read -> {
// Handle reads that have been marked as non-duplicates (which also get tagged with optical duplicate summary statistics)
if( namesOfNonDuplicateReadsAndOpticalCounts.containsKey(read.getName())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing around if appears incorrect

@@ -122,8 +126,10 @@
read.setIsDuplicate(false);
// Everything else is a duplicate
} else{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after else

@@ -323,6 +323,14 @@ public static boolean readHasMappedMate( final GATKRead read ) {
return read.isPaired() && ! read.mateIsUnmapped();
}

/**
* @param read read to check
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds confusing, could you rephrase?

* @param read read to check
* @return true if the read is paired and has a mapped mate, otherwise false
*/
public static boolean readHasMappedMate( final SAMRecord read ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do spaces go around the argument?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good question, just looking in this file alone it seems to be wholly inconsistent one way or another. I'm not going to bother changing it because it would imply that there was some principled reason for me to unify ALL the formatting in this file at least which I don't necessarily want to do in this PR.

private void update(boolean readUnmappedFlag, boolean secondaryOrSupplementary, boolean mappedMate, boolean isDuplicate) {
if (readUnmappedFlag) {
++this.UNMAPPED_READS;
} else if(secondaryOrSupplementary) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if


public Histogram<Short> getOpticalDuplicatesByLibraryIdMap() { return this.opticalDuplicatesByLibraryId; }

public static String getReadGroupLibraryName(SAMReadGroupRecord readGroup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final SAMReadGroupRecord?

@@ -23,17 +24,29 @@
private boolean deleteOnExit = true;
protected final List<String> args = new ArrayList<>();

public SamFileTester(final int readLength, final boolean deleteOnExit, final int defaultChromosomeLength, final DuplicateScoringStrategy.ScoringStrategy duplicateScoringStrategy) {
public SamFileTester(final int readLength, final boolean deleteOnExit, final int defaultChromosomeLength, final DuplicateScoringStrategy.ScoringStrategy duplicateScoringStrategy, final SAMFileHeader.SortOrder sortOrder, boolean recordsNeedSorting) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final boolean?

public abstract class AbstractMarkDuplicatesTester extends SamFileTester {

final private File metricsFile;
final DuplicationMetrics expectedMetrics;

public AbstractMarkDuplicatesTester(final ScoringStrategy duplicateScoringStrategy) {
super(50, true, SAMRecordSetBuilder.DEFAULT_CHROMOSOME_LENGTH, duplicateScoringStrategy);
public AbstractMarkDuplicatesTester(final ScoringStrategy duplicateScoringStrategy, SAMFileHeader.SortOrder sortOrder) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing finals?

this(duplicateScoringStrategy, sortOrder, true);
}

public AbstractMarkDuplicatesTester(final ScoringStrategy duplicateScoringStrategy, SAMFileHeader.SortOrder sortOrder, boolean recordNeedSorting) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing finals?

@jamesemery
Copy link
Collaborator Author

@fleharty Thank you for the review.

@jamesemery
Copy link
Collaborator Author

@fleharty The reason for the "read read" has to do with the javadocs. The first one is a flag indicating the argument name for documentation purposes. The second one is just the description, which should be reasonably formatted in the docs compared to how it looks here on github.

@fleharty
Copy link
Contributor

@jamesemery I see, of course... for some reason my eye didn't catch that the first "read" was the parameter, and the second "read" was part of the description. Thanks for the clarification.

@jamesemery jamesemery merged commit 2cc7abd into master May 22, 2018
@lbergelson lbergelson deleted the lb_update_md branch May 25, 2018 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants