-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a few of the remaining useful features to MarkDuplicatesSpark and associated tests #5377
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery Back to you. Looks good modulo typos and some random stuff.
mutex = {MarkDuplicatesSparkArgumentCollection.DUPLICATE_TAGGING_POLICY_LONG_NAME, MarkDuplicatesSparkArgumentCollection.REMOVE_SEQUENCING_DUPLICATE_READS}, optional = true) | ||
public boolean removeAllDuplicates = false; | ||
|
||
@Argument(fullName = MarkDuplicatesSparkArgumentCollection.REMOVE_SEQUENCING_DUPLICATE_READS, doc = "If true do not write duplicates to the output file instead of writing them with appropriate flags set.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing because it has the same comment as removeAllDuplicates.
@@ -93,5 +94,6 @@ private void registerGATKClasses(Kryo kryo) { | |||
kryo.register(ReadsKey.class, new FieldSerializer(kryo, ReadsKey.class)); | |||
kryo.register(ReadsKey.KeyForFragment.class, new FieldSerializer(kryo, ReadsKey.KeyForFragment.class)); | |||
kryo.register(ReadsKey.KeyForPair.class, new FieldSerializer(kryo, ReadsKey.KeyForPair.class)); | |||
kryo.register(SerializableOpticalDuplicatesFinder.class, new FieldSerializer(kryo, SerializableOpticalDuplicatesFinder.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weren't we getting rid of this class? Should this be replaces with a registration of OpticalDuplicateFinder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
@@ -68,7 +68,7 @@ protected void runTool(final JavaSparkContext ctx) { | |||
try (final BwaSparkEngine bwaEngine = new BwaSparkEngine(ctx, referenceArguments.getReferenceFileName(), bwaArgs.indexImageFile, getHeaderForReads(), getReferenceSequenceDictionary())) { | |||
final ReadFilter filter = makeReadFilter(bwaEngine.getHeader()); | |||
final JavaRDD<GATKRead> alignedReads = bwaEngine.alignPaired(getUnfilteredReads()).filter(filter::test); | |||
final JavaRDD<GATKRead> markedReads = MarkDuplicatesSpark.mark(alignedReads, bwaEngine.getHeader(), markDuplicatesSparkArgumentCollection.duplicatesScoringStrategy, new SerializableOpticalDuplicatesFinder(), getRecommendedNumReducers(), markDuplicatesSparkArgumentCollection.dontMarkUnmappedMates); | |||
final JavaRDD<GATKRead> markedReads = MarkDuplicatesSpark.mark(alignedReads, bwaEngine.getHeader(), new SerializableOpticalDuplicatesFinder(), markDuplicatesSparkArgumentCollection, getRecommendedNumReducers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SerializableOpticalDuplicateFinder -> OpticalDuplicateFinder?
@@ -174,7 +174,7 @@ protected void runTool(final JavaSparkContext ctx) { | |||
header = getHeaderForReads(); | |||
} | |||
|
|||
final JavaRDD<GATKRead> markedReads = MarkDuplicatesSpark.mark(alignedReads, header, markDuplicatesSparkArgumentCollection.duplicatesScoringStrategy, new SerializableOpticalDuplicatesFinder(), getRecommendedNumReducers(), markDuplicatesSparkArgumentCollection.dontMarkUnmappedMates); | |||
final JavaRDD<GATKRead> markedReads = MarkDuplicatesSpark.mark(alignedReads, header, new SerializableOpticalDuplicatesFinder(), markDuplicatesSparkArgumentCollection, getRecommendedNumReducers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
// Reads with this marker will be treated as non-duplicates always | ||
public static int MARKDUPLICATES_NO_OPTICAL_MARKER = -1; | ||
// Reads with this marker will be treated and marked as optical duplicates | ||
public static int MARKDUPLICATES_OPPTICAL_DUPLICATE_MARKER = -2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo OPPTICAL
try { | ||
metricsOutput.read(new FileReader(metricsFile)); | ||
} catch (final FileNotFoundException ex) { | ||
System.err.println("Metrics file not found: " + ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a test failure?
} catch (final FileNotFoundException ex) { | ||
System.err.println("Metrics file not found: " + ex); | ||
} | ||
final List<GATKDuplicationMetrics> nonEmptyMetrics = metricsOutput.getMetrics().stream().filter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a big block of complicated duplicated code, could you extract it to a method instead?
Assert.assertEquals(observedMetrics.READ_PAIR_OPTICAL_DUPLICATES, expectedList.get(5)); | ||
Assert.assertEquals(observedMetrics.PERCENT_DUPLICATION, expectedList.get(6)); | ||
|
||
//Note: IntelliJ does not like it when a parameter for a test is null (can't print it and skips the test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this comment is true? ImmutableList.of doesn't like nulls but I didn't think intellij had any trouble with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not my comment, was originally adams
List<?> expectedList = metricsExpected.get(observedMetrics.LIBRARY); | ||
expectedOpticalDuplicatesGroups += (Long) expectedList.get(5); | ||
} | ||
// NOTE: this test will fail if we add a more comprehensive example set with optical duplicates containing secondary/supplementary reads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes it sound like there's a bug in the code, maybe make it clear that that it's the test that can't handle a more complicated scenerio?
oneLineSummary = "Compares the base qualities of two SAM/BAM/CRAM files", | ||
programGroup = DiagnosticsAndQCProgramGroup.class | ||
) | ||
public class CompareReads extends GATKTool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to add this tool here? I don't see any tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should either be removed or tested.
298349b
to
891d2f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery One comment I somehow missed in my previous review, can we push compare reads to another branch or add tests?
build.gradle
Outdated
@@ -58,7 +58,7 @@ repositories { | |||
|
|||
final requiredJavaVersion = "8" | |||
final htsjdkVersion = System.getProperty('htsjdk.version','2.16.1') | |||
final picardVersion = System.getProperty('picard.version','2.18.15') | |||
final picardVersion = System.getProperty('picard.version','2.18.14-22-ge9e24bf-SNAPSHOT') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a conflict here now
oneLineSummary = "Compares the base qualities of two SAM/BAM/CRAM files", | ||
programGroup = DiagnosticsAndQCProgramGroup.class | ||
) | ||
public class CompareReads extends GATKTool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should either be removed or tested.
@lbergelson I removed compareReads. Can you give me another review? I would like to get this branch in in the shortish term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Note: This branch is still blocked on several changes in Picard (broadinstitute/picard#1236, and possibly broadinstitute/picard#1245), once those are resolved and released then this branch should hopefully get the stamp of approval from @takutosato.
Resolves #4675
Resolves #5377