Entropy panel updates #478

jameshadfield · 2017-11-28T23:54:50Z

This PR changes how the data shown in the entropy panel is calculated. The data is now calculated from the mutations on the tree and therefore updates as the visibility of nodes changes (e.g. after filtering, changing the date). This PR would close #470.

entropy vs change counts (updated)

The chart actually shows the number of observed mutations at a given position, simply because this is more straightforward to calculate. It's entirely possible to calculate entropy, but before I go down this road, perhaps the number of mutations is "better"? Calculating entropy would require knowledge of the bases at every visible node, not just a count of the mutations (I've sketched a way to do this in a single traversal).
~~There is a flag in globals - trueEntropyCalc - which toggles between the number of observed mutations vs entropy (it only works for nucleotides, will make it work for codons ASAP)~~
There is now a toggle in the panel to switch between entropy and number of mutations observed in the tree.

we can get rid of `entropy.json`

The only data that the JSON is used for is to create the gene annotations ({start, end, name}) and genome size (bp). This data could easily be included in the metadata JSON, thus getting rid of the entropy json.

Performance

The code to calculate the new data is surprisingly cheap (3-10ms), however the D3 rendering takes ~50ms. The calculation is therefore debounced (at 500ms) so it will not run when frequent changes to the visibility are happening. The structure of the data and the algorithm can be improved further to speed things up (I don't want to invest time in this if the end result is not what we're after...)

P.S. This PR is built upon PRs #477 (which is build upon #476) as it was helpful to have the genotype URLs working.

rneher · 2017-12-16T06:28:01Z

sorry for the slow response here. I really like this direction and suggest we push it a little further still by getting rid of the sequence.json as well. But first a few specific points.

Doesn't this line

https://github.com/nextstrain/auspice/blob/entropy/src/util/entropy.js#L84

overwrite the ancestral state if there are recurrent mutations at the same site? Since anc_state lives in the scope outside of the recursion and is used after the recursion, this can give wrong results. A few other ideas:

the current implementation basically duplicates the code for nucs and aa. any reason why you didn't collapse them?
We could get rid of the sequence.json along with the entropy.json. Based on what you got, we can easily extract the state of every node and replace this

https://github.com/nextstrain/auspice/blob/entropy/src/util/getGenotype.js#L27

by a function that parses mutations on the tree. (of course it would be much more efficient to get all states at once).

implementation would probably be easiest if we had a temporary field current_gt in every node. Basically pick gene/position, assign state to all nodes, determine color scale and color if genotype coloring is the goal, otherwise calc entropy. In the latter case, one would simply do this for every gene and position.

rneher · 2017-12-16T07:14:28Z

this is roughly what I am thinking (pseudo-code, really):

export const assign_sequence(nodes, anc_state, gene, pos){
  const root_node = nodes[0];
  root_node.tmp_gt = anc_state[gene][pos];
  for (const child of root_node.children) {
      assign_sequence_recursive(child, gene, pos);
    }
}

const assign_sequence_recursive(node, gene, pos){
  if (node.muts[gene][pos]){ //assuming muts = {'nuc':{123:['A', 'G'], 345:['C', 'T']}, 'HA1':{}}
    node.tmp_gt = node.muts[gene][pos][1];
  }else{
    node.tmp_gt = node.up.tmp_gt;
  }
  for (const child of node.children) {
    assign_sequence_recursive(child, gene, pos);
  }
}


const vis_state_count(nodes, anc_state, gene, pos){
  assign_sequence(nodes, anc_state, gene, pos);
  const counts={}
  for (var i=0; i<nodes.length; i++){
    if (visibility[i]&nodes[i].is_terminal()){
      if (!counts[nodes[i].tmp_gt]){
        counts[nodes[i].tmp_gt]=1;
      }else{
        counts[nodes[i].tmp_gt]++;        
      }
    }
  }
  return counts;
}

of course a traversal or all positions is probably slower than looping over variable position within one traversal. but one could be smarter about this by first calculating for each node the position that are variable within the clade and the number of visible nodes.

jameshadfield · 2017-12-18T02:14:15Z

Thanks @rneher

re: anc_state - https://github.com/nextstrain/auspice/blob/entropy/src/util/entropy.js#L84 - this is only hit the first time a mutation is encountered at this position (even though it's inside recurse) so should never overwrite. I'll double check that now.
I'll combine the code blocks (nt & aa) - they were duplicated just to make things easier when I wasn't sure this was the right direction to go now. DONE
getting rid of sequence.json would be brilliant! I'll inplement something along those lines to see. (During this i found that tree traversals are ridiculously cheap - orders of magnitude cheaper than the D3 code we use)

rneher · 2017-12-18T06:42:23Z

Thanks, my bad re anc_state. But I am still fuzzy what happens to the state[pos] and the Object.assign({}, state) as it is passed down to children. Is state[pos] then modified in place overwriting previously set state? Or does Object.assign perform deep copy?

jameshadfield · 2017-12-18T10:14:38Z

No worries. Object.assign is 1-layer-deep-copy - each time we traverse (recurse) from "parent X" down "child X1" we want to pass down a copy of the state so that it may be modified (if there's a mutation, e.g.), and then later in the recursion, when we traverse down "child X2" we need to have remembered the state at X to give to it. So that's why there's all this copying of state. (I'm sure there is a cleverer algorithm!)

Cloning of objects in JS is not good... Object.assign will deep copy things like {key -> INT, key-> STRING} but not {key-> OBJECT} (the object will be a reference). Unbelievably, the best way to deep-copy complex structs is to convert to JSON and then parse that.

In other news, i've gotten rid of the sequences.json - all that it's used for is the gene lengths, which i'll add to meta.json (they're needed to get rid of the entropy.json as well).

rneher · 2017-12-18T17:02:40Z

cool. the sequence json was giving us headaches when looking at the vcf/Tb cases.

jameshadfield · 2017-12-19T04:47:51Z

The good news: We no longer use sequences or entropy JSONs :)

The bad news: This branch now needs "updated" metadata JSONs via nextstrain/augur@bf67f14

rneher · 2017-12-19T07:33:10Z

fabulous how this makes things simpler! I think we should test this on some of the different builds for a few days (could you put jsons on dev?), but otherwise it can go in.

jameshadfield · 2017-12-21T03:13:14Z

Ok, i've come up with a way to test this - it's a bit complicated as this branch can't load the old augur JSONs, and you can't switch to the staging server JSONs without first successfully loading something!

Go to: https://auspice-dev.herokuapp.com/measles ("new" JSONs are on the live server)
Change to the staging server (i.e. "new" JSONs) via the toggle at the bottom of the sidebar
Switch datasets via the dropdown - currently the staging server has "new" JSONs for ebola, zika, measles & mumps

rneher · 2017-12-21T05:02:06Z

thanks, didn't realize this required that much tweaking....
one question: I am having trouble reconciling the entropy values with diversity. have a look at

https://auspice-dev.herokuapp.com/measles?c=gt-L_610

and zoom into the bottom clade (should be a 2/11 and 9/11 mix of P and Q). The displayed entropy value is 0.584, but it should be -plog(p)-qlog(q) = 0.47413931305783746

also fuzzy about this line:
https://github.com/nextstrain/auspice/blob/entropy/src/util/entropy.js#L156

shouldn't this be simply the number of visible tips?

trvrb · 2018-01-03T01:05:57Z

@jameshadfield: I really like this direction. I'm mildly surprised that performance didn't suffer more, but great that it's working well. I wasn't able to break this in testing. Two things:

I think I prefer keeping the original "Diversity" non-clickable title to the panel and then moving "ENTROPY COUNTS" to the right, next to "AA NT".
I guess the best JSON migration strategy is to:

run every build with updated augur
pushing JSONs just to staging
confirm that all new JSONs are working
simultaneously push auspice update and swap staging JSON live

This will be somewhat annoying, but not terrible. However, I wonder if we could make things easier by having a transitional redundant JSON format which has new annotations in meta.json, but keeps entropy.json around for the moment, thus making a JSON format that can be read by current auspice master and also by auspice entropy branch. This should work right?

jameshadfield · 2018-01-05T19:55:28Z

@trvrb re: updating JSONs, if we run new augur builds they will produce both the updated meta JSON (usable by auspice master & this PR) as well as the entropy & sequences JSONs (used only by master). So we can update all the files on S3 without worry and then merge this into master after testing...

* url-middleware: remove console logs posts use URL pushState use pushState and replaceState fix choose-dataset selector & pageChange API store dataset name in datasets.datapath remove npm history package (window.url) post URLs working (and they use redux) clean up URL handling handle URLs ourselves (remove react router) (incomplete) restore modifyURLquery (now deprecated) remove context where no longer needed date URL changes in middleware panel layour & distance measure URL changes in middleware geo resolution URL changes in middleware layout URL changes in middleware filters now change URL in middleware move url state of colorBy to middleware redux middleware prototype working

jameshadfield · 2018-01-10T19:01:00Z

This branch now up at https://auspice-dev.herokuapp.com including the url-middleware PR. All new JSONs thanks to @trvrb

Bugs to fix before merge:

forward / back inconsistencies #485 FIXED 8334e41
~~ensure entropy values correct (see above message)~~ FIXED - see below
~~move "entropy counts" toggle to RHS of panel~~ FIXED e32a1dd

jameshadfield · 2018-01-18T00:31:50Z

Hey @rneher, i've now fixed the bug in the entropy calculation (04ddea1) and tested it on trial data to ensure it's correct.

However, there are differences to the augur entropies! Why? Augur calculates entropy on the counts excluding gaps and Ns. However, the mutations encoded in the tree do not allow mutations to N, so as far as auspice is concerned, the base looks like its parent node (this is why we never get Ns when colouring by genotype). So the base/AA counts for auspice vs augur differ for columns where there are Ns or gaps, and therefore the entropy values differ. Some examples:

5 taxa, with 2 As, 1G and 2Ns inferred as As: augur = 0.63, auspice = 0.50
100 taxa, with 70As, 25Gs, 5Ns inferred as As : 0.58 vs 0.56

cc @huddlej @sidneymbell this may be of interest to you.

rneher · 2018-01-18T07:28:10Z

good point! great work!

jameshadfield added 30 commits October 24, 2017 11:30

narrative prototype

e6e23ba

paragraphs cause url change (no state change yet)

50c5dad

temporary: narrative toggle in sidebar

af849b1

break apart controls reducer into functions

e25d41b

paragraphs change state via URL query (buggy)

a8c7237

bug fixes

01feb11

update colorBy URL query within action

b337da2

add enableNarratives in globals to allow merging into master

899e376

markdown -> (local) server -> client -> components

65a081f

serve narratives from themis

db6c742

css

32f41d5

use right sidebar for narrative

01e3ab2

fix sidebar background overlay

c75f839

add [NEXTSTRAIN_NO_ACTION] to markdown parser

a76292e

reduce default hover time to 500ms

0cbae6a

fix right sidebar bugs

e6b92c4

add narratives to get_data script

9df7ff1

move narrative data to redux

2f3252b

app only renders right sidebar when narrative loads

75e9fdf

fix bug in getNarrative 404 handling

c33d253

reinstate enableNarratives global flag

c96adc2

filters in URL state

9f569e6

imporve reassignment of defaults when query URL changes

1cf6d5f

avoid unnecessary updateColors() calls

853c342

closing date box in <Info> clears URL query

042fa72

genotype URL colorBys parsed by reducer

b8e97a6

loading genotype via URL updates entropy panel

7ffa25c

sidebar colorBy updates genotype via URL / entropy panel

76fcaab

add panel layout (full / grid) to URL query

28c2e49

debounced update entropy action template working

dc4fd7b

jameshadfield added 2 commits December 14, 2017 15:04

posts use URL pushState

5a5f01c

remove console logs

f35f2d8

combine code for aa & nt mut / entropy calcs

d6b356b

set genotype at each node without using sequences.json

71e122b

jameshadfield added 3 commits December 19, 2017 09:51

remove getGenotype calls

a0781cd

remove sequences & entropy JSONs by adding annotations to meta JSON

8ad3274

tweak gene length parsing. remove console logs.

12859ac

trvrb mentioned this pull request Jan 4, 2018

separate state from URLs #480

Closed

jameshadfield added 2 commits January 10, 2018 10:51

disable narratives for merge

781101c

fix entropy calc denominator bug

04ddea1

jameshadfield added 2 commits January 17, 2018 16:43

move entropy/counts toggle to RHS

e32a1dd

keep redux page state in sync when going back via browser. Closes #485

8334e41

jameshadfield merged commit b33e2c1 into master Jan 18, 2018

jameshadfield deleted the entropy branch January 18, 2018 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entropy panel updates #478

Entropy panel updates #478

jameshadfield commented Nov 28, 2017 •

edited

Loading

rneher commented Dec 16, 2017

rneher commented Dec 16, 2017

jameshadfield commented Dec 18, 2017 •

edited

Loading

rneher commented Dec 18, 2017

jameshadfield commented Dec 18, 2017 •

edited

Loading

rneher commented Dec 18, 2017

jameshadfield commented Dec 19, 2017

rneher commented Dec 19, 2017

jameshadfield commented Dec 21, 2017 •

edited

Loading

rneher commented Dec 21, 2017

trvrb commented Jan 3, 2018

jameshadfield commented Jan 5, 2018

jameshadfield commented Jan 10, 2018 •

edited

Loading

jameshadfield commented Jan 18, 2018 •

edited

Loading

rneher commented Jan 18, 2018

Entropy panel updates #478

Entropy panel updates #478

Conversation

jameshadfield commented Nov 28, 2017 • edited Loading

entropy vs change counts (updated)

we can get rid of entropy.json

Performance

rneher commented Dec 16, 2017

rneher commented Dec 16, 2017

jameshadfield commented Dec 18, 2017 • edited Loading

rneher commented Dec 18, 2017

jameshadfield commented Dec 18, 2017 • edited Loading

rneher commented Dec 18, 2017

jameshadfield commented Dec 19, 2017

rneher commented Dec 19, 2017

jameshadfield commented Dec 21, 2017 • edited Loading

rneher commented Dec 21, 2017

trvrb commented Jan 3, 2018

jameshadfield commented Jan 5, 2018

jameshadfield commented Jan 10, 2018 • edited Loading

jameshadfield commented Jan 18, 2018 • edited Loading

rneher commented Jan 18, 2018

jameshadfield commented Nov 28, 2017 •

edited

Loading

we can get rid of `entropy.json`

jameshadfield commented Dec 18, 2017 •

edited

Loading

jameshadfield commented Dec 18, 2017 •

edited

Loading

jameshadfield commented Dec 21, 2017 •

edited

Loading

jameshadfield commented Jan 10, 2018 •

edited

Loading

jameshadfield commented Jan 18, 2018 •

edited

Loading