New Zooming Functionality #609

emmahodcroft · 2018-08-02T17:12:57Z

This PR adds another bar to the entropy graph which shows genes zooming at the same time as the entropy bar zoom at the top, as well as adding some extra zoom functionality. This is particularly needed for large genomes like TB. This extra middle bar also plots based on strand - with + and - above and below each other if both are present, or plotting in the middle if all the genes are on the same strand.

Gene outline colour for large genomes (TB) is changed to black so the white outlines don't just disappear when they're essentially lines when zoomed out (see below). And gene labels are only plotted when zoomed in enough that the gene is a reasonable size (this does mean 2K disappears by default in Zika, see below). Numbering for large genomes defaults to exponential.

TB:

Zika:

Zoom can be controlled:

In the usual way by dragging the handles or click-and-drag
Clicking outside of the grey area once to zoom out (bug fixed where handles stopped working)
Holding shift or alt/option and scrolling (this existed before but was restricted to a larger size max zoom)
Selecting a gene when the 'color by genotype' option is selected (genes now searchable by typing)
Selecting 'nucleotide' and typing in a position/positions
Clicking on an AA or nucleotide in the entropy/counts bar graph

Clicking on a nuc in TB:

Clicking on an AA in TB:

Smart zoom to two specified nuc positions:

Pick a gene from dropdown (or search), even though no variable AA sites:

Loading all ~4000 TB genes into the meta.json so they plot (though only ~200 were translated) gives an example of how it would look if less sparse, but this does seem to slow things down a bit:

The zoom tries to be smart in a few ways - sizing genes to appear approximately the same size, not re-zooming if the next nucleotide selected is already visible in the zoom window, and not allowing too-far zooming in.

As genes/nuc & positions are in the URL to 'colorBy,' this zoom is preserved as well when loading from that URL.

Entropy bar plot width is changed to 2 instead of 1 because it was near-impossible to click on them in TB.

Things to do:

Code could almost certainly do with reviewing; I'm new to JS
Spacing of the 3 charts could probably be improved - it's a bit fiddly
At high zoom in TB, the grey zoom box disappears completely. This seems to be linked to the zoom level - would be good to set a minimum width so it always draws a line, but this shouldn't limit zoom.
Changing the gene selected sets the position to 1 if a position isn't already specified or the position currently specified doesn't exist in that gene - this is because without a position, it currently doesn't change the zoom, but possibly not the default we want? (This is because this is what triggers a colorBy event, which is what all this is based on currently.)
Allow exact zoom specifications in the URL to set zoom? (By specifying a couple of nuc sites or a gene you can do this somewhat, but they may not be sites with variation, and you'd have to figure out what sites worked best.)
Should genes have annotations attached from say, GFF file, for click or mouseover? (If going to add to meta.json schema, would be good to do this now)
Could zoom in the top-box be better controlled aside from scroll-zoom?

Also currently, if you select a position without variation (ex: '1' often defaulted to currently, as above), the legend is blank in the tree plot. This is existing behaviour, but it might be nice to show the reference base/AA here - but this would mean including this information in the .jsons!

I'd welcome people trying it out (particularly on something other than TB and Zika) and giving feedback, letting me know about bugs, and giving your thoughts on anything that perhaps should be different.

This was fun to do! :)

trvrb · 2018-08-04T23:20:56Z

@emmahodcroft ---

I think this is a fantastic direction. Thanks for diving in. I have a couple small ideas with regards to the UI design. I want to add a trapezoid to show zoom (suggestion originally from @amcrisan). And I want to try to make it more obvious that the black triangles are a UI affordance and can be interacted with. I'll try to implement these myself. A couple suggestions for you for behavior:

Right now, clicking on a AA or nucleotide site zooms to this region (which I like). However, the zoom is not centered. Eg http://localhost:4000/tb?c=gt-Rv2752c_533 should appear in the very center of the display.
Some of these bounds are funny. For example, clicking on L in mumps does not zoom in at all: http://localhost:4000/mumps/global?c=gt-L_1264. Same with clicking on many nucleotide changes, which don't zoom at all.
Rather than zooming to a region that surrounds the gene of interest, I might suggest to zoom to a 2000 bp window surrounding the mutation of interest. This seems safest to me. Right now, for TB, the behavior at an intergenic nucleotide site is very different from an amino acid change. Compare http://localhost:4000/tb?c=gt-nuc_1360496 to http://localhost:4000/tb?c=gt-Rv2752c_533. This difference is surprising to me and I'd seek to reduce it.

jameshadfield · 2018-08-05T00:33:27Z

Great work @emmahodcroft -- might be worth trying to close #299 in this PR as well if possible! Not sure if TB has any overlapping reading frames (I doubt it).

rneher · 2018-08-05T08:22:40Z

we could certainly shift the boxes up and down a little as in yOffset = strand*(c+start%3)

emmahodcroft · 2018-08-06T14:54:11Z

@trvrb Thanks for feedback. I think a lot of these decisions about zoom behaviour may be influenced by how we expect people to use them most. I'm definitely happy to play around with different things - I'll give a little background to my thoughts for some of the behaviour

Right now, clicking on a AA or nucleotide site zooms to this region (which I like). However, the zoom is not centered. Eg http://localhost:4000/tb?c=gt-Rv2752c_533 should appear in the very center of the display.

So this is because all AA-based zoom is essentially gene-based zoom. Triggered by having a gene vs 'nuc' in the URL/ colorBy event. I thought that if there are multiple AA sites in a gene, the user can click between them without re-zooming/re-drawing. If the user zooms to one AA and then clicks on another in the same gene, do we want it to re-centre every time?

Some of these bounds are funny. For example, clicking on L in mumps does not zoom in at all: http://localhost:4000/mumps/global?c=gt-L_1264. Same with clicking on many nucleotide changes, which don't zoom at all.

L in mumps doesn't zoom because it tries to zoom so that either side of the gene is 1.5 times the gene's length. For L in mumps, this is the whole genome...! This could be reduced so that the genes are 'bigger' in the zoomed chart, and so larger genes will still get zoomed. It depends how much we want to show 'around' the gene - I was thinking showing the gene in context of the other genes was better, but this does come at the cost of not being able to zoom in on genes that are huge portions of the genome. (But even changing this to 1 times the genes length allows something to happen for L, which is probably better than nothing.)

I've tried to make nucleotide zoom 'smart' so that it doesn't re-zoom and re-centre with every click. If it thinks the user can see the nucleotide in the current zoom just fine (if the user is zoomed in to a region and decides to click on a nearby nuc) it won't change. However, I made a guess at the parameters (see here and line below) to use here, and I think they're too big - clicking on a nuc in the middle of the genome from being fully zoomed out sometimes doesn't zoom. Adjusting these even a bit (0.4 instead of 0.5) seems to help, but it may take a little more testing to find the right balance.

Rather than zooming to a region that surrounds the gene of interest, I might suggest to zoom to a 2000 bp window surrounding the mutation of interest. This seems safest to me. Right now, for TB, the behavior at an intergenic nucleotide site is very different from an amino acid change. Compare http://localhost:4000/tb?c=gt-nuc_1360496 to http://localhost:4000/tb?c=gt-Rv2752c_533. This difference is surprising to me and I'd seek to reduce it.

So one of the challenges I ran into was finding something that works well for huge genomes and small ones. The sizes of genes and how they compare to the size of the genome varies a lot. For AA, I thought the most useful context was the gene they are in, and zooming based on this size - this way you avoid being really zoomed into a big gene or still being very zoomed out from a small gene. (For example, in TB, esxU is only 318 bp and pks12 is 12,456 bp - 2,000 bp around a site in each is going to look very different. And at the moment for TB this is hard to adjust manually as the zoomed area is usually so small!)

For nucleotides, it seemed good to do some zoom, but I wasn't sure at what level, and I'd be happy to change this to be closer in (currently it pads with 10% genome size if possible).
You can zoom to a gene, then switch to nuc, and select nucs within view for a nice zoom level, but it's a bit round-about, I admit.

However, as I said, thinking about how people use or are likely to use clicking on or looking at variable AA and nuc sites will be most helpful in thinking what's best to display, and I'm not so familiar with what people are likely to want or do, so I'd be interested to hear how people think it'll be used. I think slightly surprising behaviour can be ok if it's more useful, but it's definitely good to find a balance.

@jameshadfield Yes I'm happy to take a look at this.

rneher · 2018-08-09T08:55:41Z

@jameshadfield anything preventing this from going live?

jameshadfield · 2018-08-09T19:16:03Z

Not from my end -- there's plenty of tweaks & additions, but i'm happy if they are made in separate branches. I'll push up today if nobody objects.

trvrb · 2018-08-09T19:19:40Z

I think this is a great basis and I'm happy to make adjustments on new branches. Thanks for all the work here @emmahodcroft.

Removes xMainOriginal, xGene, xGeneOriginal from this.scales, as well as this.xModified.

emmahodcroft added 19 commits July 24, 2018 11:48

experimental gene zoom

a6de311

improve axes numbering

99dbb99

actually get strand from json

d24f7a9

start strand, smooth partial gene plotting

a8f4b87

strand plot, adjust spacing

e14db71

only print labels large genes

dc3d582

create zoom function, make genes searchable

a3901d9

zoom by click working and bug fix

b174c26

zoom with gene url

89ab151

cleaning up zoom by gene url

41ae3cc

Merge remote-tracking branch 'origin/master' into gene_zoom

bd75b6f

remove debug

af3e136

coloring more consistant

0f0b420

better positioning if single reading frame

bd14602

scroll wheel zoom works - hold shift

51bb543

cleanup

f840652

nucleotide zoom improvements

75bf2c5

prevent zoomin too far, make nuc bars bigger to click

750e268

comments & cleanup

8e6efe1

change zoom params slightly

1878f22

jameshadfield added 2 commits August 9, 2018 14:31

add d3-format to dependencies

501aead

simplify scales used in entropy

550fbe2

Removes xMainOriginal, xGene, xGeneOriginal from this.scales, as well as this.xModified.

jameshadfield merged commit f745402 into master Aug 9, 2018

jameshadfield deleted the gene_zoom branch August 9, 2018 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Zooming Functionality #609

New Zooming Functionality #609

emmahodcroft commented Aug 2, 2018

trvrb commented Aug 4, 2018

jameshadfield commented Aug 5, 2018

rneher commented Aug 5, 2018

emmahodcroft commented Aug 6, 2018

rneher commented Aug 9, 2018

jameshadfield commented Aug 9, 2018

trvrb commented Aug 9, 2018 •

edited

Loading

New Zooming Functionality #609

New Zooming Functionality #609

Conversation

emmahodcroft commented Aug 2, 2018

trvrb commented Aug 4, 2018

jameshadfield commented Aug 5, 2018

rneher commented Aug 5, 2018

emmahodcroft commented Aug 6, 2018

rneher commented Aug 9, 2018

jameshadfield commented Aug 9, 2018

trvrb commented Aug 9, 2018 • edited Loading

trvrb commented Aug 9, 2018 •

edited

Loading