-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding soft clipped ratio to OverclippedReadFilter #5995
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5995 +/- ##
===============================================
- Coverage 86.927% 18.43% -68.497%
+ Complexity 32732 9207 -23525
===============================================
Files 2014 2016 +2
Lines 151333 151426 +93
Branches 16612 16623 +11
===============================================
- Hits 131549 27908 -103641
- Misses 13724 120739 +107015
+ Partials 6060 2779 -3281
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of issues, and one or two questions. The bigger question I have however is why continue to use minAlignedBases in the ratio tests at all, especially since it's no longer configurable. You are enforcing that reads below (right now 30) still fail the overclipped filter and then asserting on top of that that their clipping is below your threshold. I would say you have two options:
- Make them mutually exclusive, thus if you are in one of the ratio modes the aligned length isn't considered at all. This has the drawback that you can never apply both tests if you want but does away with the hard coded 30 value that should affect the ratio tests.
- Make them all apply configurably. I.e. you can run with both a clipping ratio test AND a minimum absolute number of aligned bases test (and perhaps even both clipping ratio and edge clipping ratio). This is doable in this class with relative ease (test just gathers everything at once and decides based on settings which ones to run) but you may also want to pull the ratio tests out into separate filters if you do that.
|
||
@Override | ||
public boolean test(final GATKRead read) { | ||
logger.debug( "About to test read: " + read.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This debug statement is going to be expensive, as it has to call read.toString() for every read which means every read gets parsed completely even if debug is off. Either use the message supplier paradigm or encase this in an if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to remove all debug logging statements anyway - they were in for... well.. debugging.
public boolean test(final GATKRead read) { | ||
logger.debug( "About to test read: " + read.toString()); | ||
|
||
// NOTE: Since we have mutex'd the args for the clipping ratios, we only need to see if they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a line about this behavior to the javadoc for this filter as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
final double softClipRatio = ((double)numSoftClippedBases / (double)totalLength); | ||
|
||
logger.debug( " Num Soft Clipped Bases: " + numSoftClippedBases ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, use suppliers or hide them behind if statements to prevent unnecessary method calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug statements are getting removed.
static final long serialVersionUID = 1L; | ||
private final Logger logger = LogManager.getLogger(this.getClass()); | ||
|
||
private static final double DEFAULT_MIN_SOFT_CLIPPED_RATIO = Double.MIN_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These default values get filled in on the documentation page here: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_engine_filters_OverclippedReadFilter.php meaning that you will get some godawful representation of Double.MIN_VALUE filled in on this page. I would suggest either choosing a more manageable default value like 1.0, or setting this to a Double
object and making the default value null
so it shows up as such and is not confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double
/ null
it is!
private int minAlignedBases = 30; | ||
|
||
@VisibleForTesting | ||
@Argument(fullName = "soft-clipped-ratio-threshold", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move these names to constants in ReadFilterArgumentDefinitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
@VisibleForTesting | ||
@Argument(fullName = "leading-trailing-soft-clipped-ratio", | ||
doc = "Minimum ratio of soft clipped bases (leading / trailing the cigar string) to total bases in read for read to be included.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to necessarily make these ("leading-trailing-soft-clipped-ratio", "soft-clipped-ratio-threshold") mutually exclusive? It seems like it may be possible that one cares about different values for one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my use case there is never a time when you want to do a combined filter on both. I don't think it's useful to have them both on at the same time.
We can fix it if someone ever wants to make this happen together.
logger.debug( " Read Filter status: " + (softClipRatio > minimumSoftClippedRatio) ); | ||
logger.debug( " Aligned Length: " + alignedLength ); | ||
|
||
return (alignedLength >= minAlignedBases) && (softClipRatio > minimumSoftClippedRatio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should either be || or you should drop the first clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBE
final double clipRatio, | ||
final boolean expectedResult) { | ||
|
||
final OverclippedReadFilter filter = new OverclippedReadFilter(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
|
||
// --------------------------------------- | ||
// Min bases test: | ||
testData.add(new Object[] { "20S5M", 0.2, false }); // 20/25 = .8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think these tests actually demonstrate the minAlignedBases
behavior, as both of them have <30 matching cigar bases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were showing that strings below 30 bases will return false even though by the other metric they would pass. The other tests have a smaller minAlignedBases and all pass because the threshold has been hit.
These tests have since been removed anyway because of the new class structure.
logger.debug( " Read Filter status: " + (softClipRatio > minimumLeadingTrailingSoftClippedRatio) ); | ||
logger.debug( " Aligned Length: " + alignedLength ); | ||
|
||
return (alignedLength >= minAlignedBases) && (softClipRatio > minimumLeadingTrailingSoftClippedRatio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these arguments are really mutually exclusive I wouldn't have both options (minAlignedBases, and minimumLeadingTrailingSoftClippedRatio) apply in every case. Either let them both exist at once (as they are all 3 separate tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBE
Perhaps call the alternative a |
Yeah - I'm gonna pull this out into a new read filter. |
Addressed review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about the argument name and then you should feel free to merge.
public static final String SOFT_CLIPPED_RATIO_THRESHOLD = "soft-clipped-ratio-threshold"; | ||
public static final String SOFT_CLIPPED_LEADING_TRAILING_RATIO_THRESHOLD = "soft-clipped-leading-trailing-ratio"; | ||
|
||
public static final String INVERT_FILTER = "invert-filter"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this "invert-soft-clip-ratio" or something like that to make it a little clearer that its associated with THIS filter not just any filter on the command line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Really, this should be a feature of every reader. I'll add a ticket for it.
Added two new arguments to
OverclippedReadFilter
that enable filtering based on a ratio of soft clipped reads to total read length.Necessary for long reads analysis.