-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc fix for SOR #5703
Doc fix for SOR #5703
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5703 +/- ##
===============================================
- Coverage 87.066% 87.003% -0.063%
- Complexity 31853 32094 +241
===============================================
Files 1940 1974 +34
Lines 146679 147213 +534
Branches 16218 16214 -4
===============================================
+ Hits 127707 128079 +372
- Misses 13060 13231 +171
+ Partials 5912 5903 -9
|
@sooheelee can you take a quick look? |
Just getting to this @ldgauthier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review is inline.
@@ -37,11 +37,11 @@ | |||
* | |||
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT (the sum makes it symmetric). A high value is indicative of large difference where one entry is very small compared to the others. A scale factor of refRatio/altRatio where</p> | |||
* | |||
* $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$ | |||
* $$ refRatio = \frac{min(X[0][0], X[0][1])}{max(X[0][0], X[0][1} $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Missing a closing
]
and closing)
s. - If we are changing the equations here, then I believe we should also update AS_StrandOddsRatio.
The forum does not translate javadoc/gatkdoc LaTex nor ever did it in the past as far as I'm aware. The javadoc looks like this:
If we take these to a LaTeX renderer, the equations look thusly:
Are you okay with continuing to present the LaTeX equations @ldgauthier? Here are some choices to consider.
- Adding images of the equations. This will be slightly different than the approach we took to updating Article#11074. I believe it is a matter of html sourcing images within the javadoc portion of the code and uploading images to the forum.
- Looking into the forum CSS style sheets and figuring out how to dynamically render LaTeX equations.
- Keep the status quo but also provide a link to an online LaTeX renderer for folks to render.
Back to you @ldgauthier. |
I didn't realize the html translated the LaTeX literally. No one has complained before, so I'm in favor of option 3. |
I've been thinking about this and why we have this status quo. I'm not the best person for it but I think it worth my checking out https://www.mathjax.org/ and similar options before we settle on option 3. The solution may be straight-forward and I think given our methods are about the math, we should try to solve this. |
I will be back after some reading and testing. |
@ldgauthier, here is a solution that will render the LaTeX for the forum.
I've tested that this renders on the forum to Please let me know if you want me to make these changes on your branch for the SOR and AS_SOR documentation. |
- Render LaTex equations with plugin (pre-tested) - Render Markdown table correctly and add spacing between columns - In 'Statistical notes' section, introduce the contingency table and place appropriately with text - Provide both R and 1/R equations instead of just one (both are mentioned in the text) - Close out the equation elements appropriately with `]` and `)`s - Mention the final annotation is given in log space and provide a link to an example calculation - Update links, e.g. to related annotations, to point to GATK4 current docs instead of v3.8
@ldgauthier, I've performed a pass. I think the Caveat section of the AS_StrandOddsRatio doc could use some attention. Here it is currently (I did not touch it): Here is what the rendered javadocs look like now. Let me know what you think. |
I was thinking @ldgauthier that it would be great if the example numbers given in the caveats section were expanded to be an example in how SOR/FisherStrand are each calculated. Then I can unlink the thread URL I had added so we are not sending folks all over the forum. |
I just stumbled across some data where the ref is very biased, but the alt is not too bad: |
Looks like a great example to use @ldgauthier. Do you prefer to write out the example calculation or have Comms do so? I'm a bit focused on gCNV currently but assume you are even more occupied. |
I don't have time to do it right now, so if someone on your team does then
that's great.
…On Mon, Mar 11, 2019 at 5:52 PM Soo Hee Lee ***@***.***> wrote:
Looks like a great example to use @ldgauthier
<https://github.com/ldgauthier>. Do you prefer to write out the example
calculation or have Comms do so? I'm a bit focused on gCNV currently but
assume you are even more occupied.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5703 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGRhdO00rtqo4RC85zwhMOxXl9d4xb0Iks5vVtAkgaJpZM4bH56F>
.
--
Laura Doyle Gauthier, Ph.D.
Associate Director, Germline Methods
Data Sciences Platform
[email protected]
Broad Institute of MIT & Harvard
320 Charles St.
Cambridge MA 0214
|
It's only me so I will write this out for your review. I've put in a word with @rcmajovski for additional future GATK documentation support. He will follow up with you. |
I just looked through the code and see that the current explanation is incomplete and also inaccurate. I will amend to reflect (i) the addition of 1 to counts and (ii) that SOR results from |
- Clarify SOR is calculated with `ln(ratio) + ln(refRatio) - ln(altRatio)` - Mention one is added to each count - Show step-by-step calculations with example counts
@ldgauthier, I've fleshed out an example calculation using the counts you provided. I added the new section "Example calculation" to SOR documentation and go through the calcuations step-by-step:Simply updated AS_SOR to point to example calculation in SOR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to approve my own changes to the branch. Let me know when I can merge.
Given these changes pertain solely to the Javadoc portion, and the amount of time and back-and-forth we've gone through, I assume this is finished @ldgauthier and will merge this. |
* | ||
* $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$ | ||
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT. The sum makes it symmetric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say that the sum makes it symmetric. That name persists from early stages of development when the model was different. It's actually important to note that it's NOT symmetric because we don't want to penalize sites where the alt allele looks well balanced, but the ref allele isn't, presumably by chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't write these parts you are commenting on @ldgauthier and assume whoever did knew what they were talking about. I am hesitant to change/remove parts without discussing with the originator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bertrand wrote this part of the comments (https://github.com/broadinstitute/gsa-unstable/commit/fb3d3e84e4951d3ea9e3a93d0ffb1559b9004b4e), but that was before the calculation was updated with the altRatio (which was my change: https://github.com/broadinstitute/gsa-unstable/commit/893997ddb5c0456714ba782934c3478010da1c09 -- apparently the "ensures that the annotations is large only" was my fault, but it was five years ago.)
* $$ altRatio = \frac{max(X[1][0], X[1][1])}{min(X[1][0], X[1][1]} $$ | ||
* <img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{min(X[1][0], X[1][1])}{max(X[1][0], X[1][1])} $$" border="0"/> | ||
* | ||
* <p>ensures that the annotation value is large only. The final SOR annotation is given in natural log space.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where that old text comes from, but a better way to put it might be that the min and max in the altRatio normalizes the values so that it doesn't matter whether a site has majority + or - alt reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the old text comes from a developer who is speaking math lingo that means something to other math people. Thought you approved of these earlier?
* </pre> | ||
* | ||
* <p>Read support shows some strand bias for the reference allele but not | ||
* the alternate allele. The SB_TABLE annotation (a non-GATK annotation) indicates 1450 reference alleles on the forward strand, 345 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it useful to include my example verbatim with the "non-GATK annotation"? You could give the table separately as the sum of the per-sample SB annotation tables.
(Ideally the Hail tool that produced the SB_TABLE will become public in the next month or so.)
1450 reference alleles -> 1450 reference reads
ditto alternate alleles -> reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the example you gave and so this is the example I used to illustrate the calculation. I apologize I cannot spend any more time on this even if I want to. I have ~ a week left to finalize the gCNV tutorial. Please ask @rcmajovski, who owns GATK documentation, for additional Comms support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked in person about a few edits I wanted to make, but that must have slipped through the cracks since it wasn't noted here. These can be addressed in the next docs update.
I'd be happy to look over the LaTeX style for the next round of edits. Probably good to take advantage of things like subscripts to indicate matrix elements, etc. that LaTeX enables. In any case, thanks for adding this @sooheelee! |
Closes #5700