Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc fix for SOR #5703

Merged
merged 4 commits into from
Mar 14, 2019
Merged

Doc fix for SOR #5703

merged 4 commits into from
Mar 14, 2019

Conversation

ldgauthier
Copy link
Contributor

Closes #5700

@ldgauthier ldgauthier requested a review from sooheelee February 21, 2019 17:04
@codecov-io
Copy link

codecov-io commented Feb 21, 2019

Codecov Report

Merging #5703 into master will decrease coverage by 0.063%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #5703       +/-   ##
===============================================
- Coverage     87.066%   87.003%   -0.063%     
- Complexity     31853     32094      +241     
===============================================
  Files           1940      1974       +34     
  Lines         146679    147213      +534     
  Branches       16218     16214        -4     
===============================================
+ Hits          127707    128079      +372     
- Misses         13060     13231      +171     
+ Partials        5912      5903        -9
Impacted Files Coverage Δ Complexity Δ
...ender/tools/walkers/annotator/StrandOddsRatio.java 94.737% <ø> (ø) 8 <0> (ø) ⬇️
...s/annotator/allelespecific/AS_StrandOddsRatio.java 80% <ø> (ø) 4 <0> (ø) ⬇️
...k/pipelines/ReadsPipelineSparkIntegrationTest.java 1.563% <0%> (-95.313%) 1% <0%> (-6%)
...adinstitute/hellbender/utils/IntegrationUtils.java 0% <0%> (-95.238%) 0% <0%> (-5%)
...nder/tools/spark/pipelines/ReadsPipelineSpark.java 0% <0%> (-90.741%) 0% <0%> (-13%)
...titute/hellbender/engine/TwoPassVariantWalker.java 69.231% <0%> (-26.007%) 6% <0%> (+2%)
...itute/hellbender/utils/runtime/ScriptExecutor.java 66.667% <0%> (-14.583%) 7% <0%> (-3%)
...stitute/hellbender/tools/HaplotypeCallerSpark.java 70.115% <0%> (-12.644%) 18% <0%> (-2%)
...aller/HaplotypeCallerGenotypingEngineUnitTest.java 80.693% <0%> (-8.808%) 18% <0%> (-12%)
...er/tools/walkers/annotator/StrandBiasBySample.java 78.261% <0%> (-8.696%) 7% <0%> (-2%)
... and 190 more

@ldgauthier
Copy link
Contributor Author

@sooheelee can you take a quick look?

@sooheelee
Copy link
Contributor

Just getting to this @ldgauthier.

Copy link
Contributor

@sooheelee sooheelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review is inline.

@@ -37,11 +37,11 @@
*
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT (the sum makes it symmetric). A high value is indicative of large difference where one entry is very small compared to the others. A scale factor of refRatio/altRatio where</p>
*
* $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$
* $$ refRatio = \frac{min(X[0][0], X[0][1])}{max(X[0][0], X[0][1} $$
Copy link
Contributor

@sooheelee sooheelee Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Missing a closing ] and closing )s.
  • If we are changing the equations here, then I believe we should also update AS_StrandOddsRatio.

The forum does not translate javadoc/gatkdoc LaTex nor ever did it in the past as far as I'm aware. The javadoc looks like this:

screenshot 2019-03-06 14 48 16


If we take these to a LaTeX renderer, the equations look thusly:

screenshot 2019-03-06 14 52 19


Are you okay with continuing to present the LaTeX equations @ldgauthier? Here are some choices to consider.

  1. Adding images of the equations. This will be slightly different than the approach we took to updating Article#11074. I believe it is a matter of html sourcing images within the javadoc portion of the code and uploading images to the forum.
  2. Looking into the forum CSS style sheets and figuring out how to dynamically render LaTeX equations.
  3. Keep the status quo but also provide a link to an online LaTeX renderer for folks to render.

@sooheelee sooheelee assigned ldgauthier and unassigned sooheelee Mar 6, 2019
@sooheelee
Copy link
Contributor

Back to you @ldgauthier.

@ldgauthier
Copy link
Contributor Author

I didn't realize the html translated the LaTeX literally. No one has complained before, so I'm in favor of option 3.

@sooheelee
Copy link
Contributor

sooheelee commented Mar 7, 2019

I've been thinking about this and why we have this status quo. I'm not the best person for it but I think it worth my checking out https://www.mathjax.org/ and similar options before we settle on option 3. The solution may be straight-forward and I think given our methods are about the math, we should try to solve this.

@sooheelee
Copy link
Contributor

I will be back after some reading and testing.

@sooheelee sooheelee assigned sooheelee and unassigned ldgauthier Mar 7, 2019
@sooheelee
Copy link
Contributor

@ldgauthier, here is a solution that will render the LaTeX for the forum.

<img src="http://latex.codecogs.com/svg.latex?$$ refRatio = \frac{min(X[0][0], X[0][1])}{max(X[0][0], X[0][1])} $$" border="0"/>

I've tested that this renders on the forum to

Please let me know if you want me to make these changes on your branch for the SOR and AS_SOR documentation.

@sooheelee sooheelee assigned ldgauthier and unassigned sooheelee Mar 7, 2019
- Render LaTex equations with plugin (pre-tested)
- Render Markdown table correctly and add spacing between columns
- In 'Statistical notes' section, introduce the contingency table and place appropriately with text
- Provide both R and 1/R equations instead of just one (both are mentioned in the text)
- Close out the equation elements appropriately with `]` and `)`s
- Mention the final annotation is given in log space and provide a link to an example calculation
- Update links, e.g. to related annotations, to point to GATK4 current docs instead of v3.8
@sooheelee
Copy link
Contributor

@ldgauthier, I've performed a pass. I think the Caveat section of the AS_StrandOddsRatio doc could use some attention. Here it is currently (I did not touch it):

screenshot 2019-03-07 15 37 39

Here is what the rendered javadocs look like now. Let me know what you think.
screenshot 2019-03-07 15 37 21

screenshot 2019-03-07 15 37 32

@sooheelee sooheelee assigned sooheelee and ldgauthier and unassigned ldgauthier and sooheelee Mar 7, 2019
@sooheelee
Copy link
Contributor

sooheelee commented Mar 8, 2019

I was thinking @ldgauthier that it would be great if the example numbers given in the caveats section were expanded to be an example in how SOR/FisherStrand are each calculated. Then I can unlink the thread URL I had added so we are not sending folks all over the forum.

@ldgauthier
Copy link
Contributor Author

I just stumbled across some data where the ref is very biased, but the alt is not too bad:
AC=78;AF=2.92135e-02;AN=2670;DP=31492;FS=48.628;MQ=58.02;MQRankSum=-2.02400e+00;MQ_DP=3209;QD=3.03;ReadPosRankSum=-1.66500e-01;SB_TABLE=1450,345,160,212;SOR=0.592;VarDP=2167

@sooheelee
Copy link
Contributor

Looks like a great example to use @ldgauthier. Do you prefer to write out the example calculation or have Comms do so? I'm a bit focused on gCNV currently but assume you are even more occupied.

@ldgauthier
Copy link
Contributor Author

ldgauthier commented Mar 14, 2019 via email

@sooheelee
Copy link
Contributor

It's only me so I will write this out for your review. I've put in a word with @rcmajovski for additional future GATK documentation support. He will follow up with you.

@sooheelee
Copy link
Contributor

sooheelee commented Mar 14, 2019

I just looked through the code and see that the current explanation is incomplete and also inaccurate. I will amend to reflect (i) the addition of 1 to counts and (ii) that SOR results from ln(ratio) + ln(refRatio) - ln(altRatio).

- Clarify SOR is calculated with `ln(ratio) + ln(refRatio) - ln(altRatio)`
- Mention one is added to each count
- Show step-by-step calculations with example counts
@sooheelee
Copy link
Contributor

sooheelee commented Mar 14, 2019

@ldgauthier, I've fleshed out an example calculation using the counts you provided.

I added the new section "Example calculation" to SOR documentation and go through the calcuations step-by-step:

Screenshot 2019-03-14 17 05 41


Simply updated AS_SOR to point to example calculation in SOR:

Screenshot 2019-03-14 17 05 26

Copy link
Contributor

@sooheelee sooheelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to approve my own changes to the branch. Let me know when I can merge.

@sooheelee
Copy link
Contributor

Given these changes pertain solely to the Javadoc portion, and the amount of time and back-and-forth we've gone through, I assume this is finished @ldgauthier and will merge this.

@sooheelee sooheelee merged commit 9c40aba into master Mar 14, 2019
@sooheelee sooheelee deleted the ldg_SORdocs branch March 14, 2019 22:28
*
* $$ refRatio = \frac{max(X[0][0], X[0][1])}{min(X[0][0], X[0][1} $$
* <p>The sum R + 1/R is used to detect a difference in strand bias for REF and for ALT. The sum makes it symmetric.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say that the sum makes it symmetric. That name persists from early stages of development when the model was different. It's actually important to note that it's NOT symmetric because we don't want to penalize sites where the alt allele looks well balanced, but the ref allele isn't, presumably by chance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't write these parts you are commenting on @ldgauthier and assume whoever did knew what they were talking about. I am hesitant to change/remove parts without discussing with the originator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bertrand wrote this part of the comments (https://github.com/broadinstitute/gsa-unstable/commit/fb3d3e84e4951d3ea9e3a93d0ffb1559b9004b4e), but that was before the calculation was updated with the altRatio (which was my change: https://github.com/broadinstitute/gsa-unstable/commit/893997ddb5c0456714ba782934c3478010da1c09 -- apparently the "ensures that the annotations is large only" was my fault, but it was five years ago.)

* $$ altRatio = \frac{max(X[1][0], X[1][1])}{min(X[1][0], X[1][1]} $$
* <img src="http://latex.codecogs.com/svg.latex?$$ altRatio = \frac{min(X[1][0], X[1][1])}{max(X[1][0], X[1][1])} $$" border="0"/>
*
* <p>ensures that the annotation value is large only. The final SOR annotation is given in natural log space.</p>
Copy link
Contributor Author

@ldgauthier ldgauthier Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where that old text comes from, but a better way to put it might be that the min and max in the altRatio normalizes the values so that it doesn't matter whether a site has majority + or - alt reads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the old text comes from a developer who is speaking math lingo that means something to other math people. Thought you approved of these earlier?

* </pre>
*
* <p>Read support shows some strand bias for the reference allele but not
* the alternate allele. The SB_TABLE annotation (a non-GATK annotation) indicates 1450 reference alleles on the forward strand, 345
Copy link
Contributor Author

@ldgauthier ldgauthier Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to include my example verbatim with the "non-GATK annotation"? You could give the table separately as the sum of the per-sample SB annotation tables.
(Ideally the Hail tool that produced the SB_TABLE will become public in the next month or so.)

1450 reference alleles -> 1450 reference reads
ditto alternate alleles -> reads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the example you gave and so this is the example I used to illustrate the calculation. I apologize I cannot spend any more time on this even if I want to. I have ~ a week left to finalize the gCNV tutorial. Please ask @rcmajovski, who owns GATK documentation, for additional Comms support.

Copy link
Contributor Author

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked in person about a few edits I wanted to make, but that must have slipped through the cracks since it wasn't noted here. These can be addressed in the next docs update.

@samuelklee
Copy link
Contributor

I'd be happy to look over the LaTeX style for the next round of edits. Probably good to take advantage of things like subscripts to indicate matrix elements, etc. that LaTeX enables. In any case, thanks for adding this @sooheelee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants