-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrote M2 isActive method as special case of somatic likelihoods model #5814
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5814 +/- ##
===============================================
- Coverage 87.029% 80.305% -6.724%
+ Complexity 32104 30542 -1562
===============================================
Files 1972 1978 +6
Lines 147187 147475 +288
Branches 16201 16230 +29
===============================================
- Hits 128096 118430 -9666
- Misses 13184 23330 +10146
+ Partials 5907 5715 -192
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidbenjamin Sorry for the delay, all looks well to me!
docs/mutect/mutect.tex
Outdated
|
||
Under the first assumption the contribution of ref bases to the sum in Equation \ref{tumor-lod} vanishes -- errorless ref bases contribute no entropy -- so we only need to sum over alt bases. By the second assumption the likelihood of an alt read is related to its base quality (and associated error rate $\epsilon$) as $\ell_{r,{\rm alt}} = 1 - \epsilon_r$. | ||
|
||
We compute Equation \ref{tumor-lod} for two possibilities; first, that an alt allele exists, second that only the ref allele exists. In the former case, Initializing $\bar{z}_{r,{\rm ref(alt)}} = 1$ for ref (alt) bases a single iteration for $q(\vf)$ gives $\vbeta = (N_{\rm alt} + 1, N_{\rm ref} + 1)$. For alt reads $r$ we then obtain from Equation \ref{f-tilde} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we obtain this equation from the equation about z_bar (Equation 6) right above \ref{f-tilde}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments. Upon looking at the code you are refactoring and reflecting I would say this is certainly no more dangerous thread safety-wise than it was before as it appears to be a mostly equivalent implementation. The safety is predicated on the fact that the resize operation updates the array in its last step and that it doesn't change any of the underlying cache values, rather opting to copy them instead. I haven't looked closely at the code changes in this branch however.
/** | ||
* Wrapper class so that the log10Factorial array is only calculated if it's used | ||
*/ | ||
public class Log10FactorialCache extends IntToDoubleFunctionCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
@Override | ||
protected double compute(final int n) { | ||
return MathUtils.log10Gamma(n + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without delving into the details of what exactly your branch is computing as of right now, it looks like this isn't actually doing the factorial computation here but rather calling out to the logGamma function. This may be equivalent? If not then you should update the comments on this class to reflect what its doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the gamma function satisfied Gamma(n + 1) = n! for integers n. The old implementation instead grew the cache incrementally as log((n+1)!) = log(n!) + log(n+1). Since log(n+1) in the old implementation was also cached this was faster, but the one-time cost of computing logGamma a few thousand times to fill the cache is probably a few milliseconds, if that.
public IntToDoubleFunctionCache() { } | ||
|
||
/** | ||
* Get the value of log10(n), expanding the cache as necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update this javadoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
* @return log10(n) | ||
*/ | ||
public double get(final int i) { | ||
Utils.validateArg(i >= 0, () -> String.format("Cache doesn't apply to negative number %d", i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe document this restriction (or even change the class name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Closes #5775.
@takutosato This doesn't affect M2 results (well, actually it improves sensitivity by 0.01%) but it reduces runtime by about 5% and makes the docs and code cleaner.
@jamesemery could you verify that in abstracting out
Log10Cache
asIntToDoubleFunctionCache
I didn't spoil its thread safety?