Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Zooming Functionality #609

Merged
merged 22 commits into from
Aug 9, 2018
Merged

New Zooming Functionality #609

merged 22 commits into from
Aug 9, 2018

Conversation

emmahodcroft
Copy link
Member

This PR adds another bar to the entropy graph which shows genes zooming at the same time as the entropy bar zoom at the top, as well as adding some extra zoom functionality. This is particularly needed for large genomes like TB. This extra middle bar also plots based on strand - with + and - above and below each other if both are present, or plotting in the middle if all the genes are on the same strand.

Gene outline colour for large genomes (TB) is changed to black so the white outlines don't just disappear when they're essentially lines when zoomed out (see below). And gene labels are only plotted when zoomed in enough that the gene is a reasonable size (this does mean 2K disappears by default in Zika, see below). Numbering for large genomes defaults to exponential.

TB:
image

Zika:
image

Zoom can be controlled:

  • In the usual way by dragging the handles or click-and-drag
  • Clicking outside of the grey area once to zoom out (bug fixed where handles stopped working)
  • Holding shift or alt/option and scrolling (this existed before but was restricted to a larger size max zoom)
  • Selecting a gene when the 'color by genotype' option is selected (genes now searchable by typing)
  • Selecting 'nucleotide' and typing in a position/positions
  • Clicking on an AA or nucleotide in the entropy/counts bar graph

Clicking on a nuc in TB:
image

Clicking on an AA in TB:
image

Smart zoom to two specified nuc positions:
image

Pick a gene from dropdown (or search), even though no variable AA sites:
image

Loading all ~4000 TB genes into the meta.json so they plot (though only ~200 were translated) gives an example of how it would look if less sparse, but this does seem to slow things down a bit:
image

The zoom tries to be smart in a few ways - sizing genes to appear approximately the same size, not re-zooming if the next nucleotide selected is already visible in the zoom window, and not allowing too-far zooming in.

As genes/nuc & positions are in the URL to 'colorBy,' this zoom is preserved as well when loading from that URL.

Entropy bar plot width is changed to 2 instead of 1 because it was near-impossible to click on them in TB.

Things to do:

  • Code could almost certainly do with reviewing; I'm new to JS
  • Spacing of the 3 charts could probably be improved - it's a bit fiddly
  • At high zoom in TB, the grey zoom box disappears completely. This seems to be linked to the zoom level - would be good to set a minimum width so it always draws a line, but this shouldn't limit zoom.
  • Changing the gene selected sets the position to 1 if a position isn't already specified or the position currently specified doesn't exist in that gene - this is because without a position, it currently doesn't change the zoom, but possibly not the default we want? (This is because this is what triggers a colorBy event, which is what all this is based on currently.)
  • Allow exact zoom specifications in the URL to set zoom? (By specifying a couple of nuc sites or a gene you can do this somewhat, but they may not be sites with variation, and you'd have to figure out what sites worked best.)
  • Should genes have annotations attached from say, GFF file, for click or mouseover? (If going to add to meta.json schema, would be good to do this now)
  • Could zoom in the top-box be better controlled aside from scroll-zoom?

Also currently, if you select a position without variation (ex: '1' often defaulted to currently, as above), the legend is blank in the tree plot. This is existing behaviour, but it might be nice to show the reference base/AA here - but this would mean including this information in the .jsons!

I'd welcome people trying it out (particularly on something other than TB and Zika) and giving feedback, letting me know about bugs, and giving your thoughts on anything that perhaps should be different.

This was fun to do! :)

@trvrb
Copy link
Member

trvrb commented Aug 4, 2018

@emmahodcroft ---

I think this is a fantastic direction. Thanks for diving in. I have a couple small ideas with regards to the UI design. I want to add a trapezoid to show zoom (suggestion originally from @amcrisan). And I want to try to make it more obvious that the black triangles are a UI affordance and can be interacted with. I'll try to implement these myself. A couple suggestions for you for behavior:

  1. Right now, clicking on a AA or nucleotide site zooms to this region (which I like). However, the zoom is not centered. Eg http://localhost:4000/tb?c=gt-Rv2752c_533 should appear in the very center of the display.

  2. Some of these bounds are funny. For example, clicking on L in mumps does not zoom in at all: http://localhost:4000/mumps/global?c=gt-L_1264. Same with clicking on many nucleotide changes, which don't zoom at all.

  3. Rather than zooming to a region that surrounds the gene of interest, I might suggest to zoom to a 2000 bp window surrounding the mutation of interest. This seems safest to me. Right now, for TB, the behavior at an intergenic nucleotide site is very different from an amino acid change. Compare http://localhost:4000/tb?c=gt-nuc_1360496 to http://localhost:4000/tb?c=gt-Rv2752c_533. This difference is surprising to me and I'd seek to reduce it.

@jameshadfield
Copy link
Member

Great work @emmahodcroft -- might be worth trying to close #299 in this PR as well if possible! Not sure if TB has any overlapping reading frames (I doubt it).

@rneher
Copy link
Member

rneher commented Aug 5, 2018

we could certainly shift the boxes up and down a little as in yOffset = strand*(c+start%3)

@emmahodcroft
Copy link
Member Author

@trvrb Thanks for feedback. I think a lot of these decisions about zoom behaviour may be influenced by how we expect people to use them most. I'm definitely happy to play around with different things - I'll give a little background to my thoughts for some of the behaviour

Right now, clicking on a AA or nucleotide site zooms to this region (which I like). However, the zoom is not centered. Eg http://localhost:4000/tb?c=gt-Rv2752c_533 should appear in the very center of the display.

So this is because all AA-based zoom is essentially gene-based zoom. Triggered by having a gene vs 'nuc' in the URL/ colorBy event. I thought that if there are multiple AA sites in a gene, the user can click between them without re-zooming/re-drawing. If the user zooms to one AA and then clicks on another in the same gene, do we want it to re-centre every time?

Some of these bounds are funny. For example, clicking on L in mumps does not zoom in at all: http://localhost:4000/mumps/global?c=gt-L_1264. Same with clicking on many nucleotide changes, which don't zoom at all.

L in mumps doesn't zoom because it tries to zoom so that either side of the gene is 1.5 times the gene's length. For L in mumps, this is the whole genome...! This could be reduced so that the genes are 'bigger' in the zoomed chart, and so larger genes will still get zoomed. It depends how much we want to show 'around' the gene - I was thinking showing the gene in context of the other genes was better, but this does come at the cost of not being able to zoom in on genes that are huge portions of the genome. (But even changing this to 1 times the genes length allows something to happen for L, which is probably better than nothing.)

I've tried to make nucleotide zoom 'smart' so that it doesn't re-zoom and re-centre with every click. If it thinks the user can see the nucleotide in the current zoom just fine (if the user is zoomed in to a region and decides to click on a nearby nuc) it won't change. However, I made a guess at the parameters (see here and line below) to use here, and I think they're too big - clicking on a nuc in the middle of the genome from being fully zoomed out sometimes doesn't zoom. Adjusting these even a bit (0.4 instead of 0.5) seems to help, but it may take a little more testing to find the right balance.

Rather than zooming to a region that surrounds the gene of interest, I might suggest to zoom to a 2000 bp window surrounding the mutation of interest. This seems safest to me. Right now, for TB, the behavior at an intergenic nucleotide site is very different from an amino acid change. Compare http://localhost:4000/tb?c=gt-nuc_1360496 to http://localhost:4000/tb?c=gt-Rv2752c_533. This difference is surprising to me and I'd seek to reduce it.

So one of the challenges I ran into was finding something that works well for huge genomes and small ones. The sizes of genes and how they compare to the size of the genome varies a lot. For AA, I thought the most useful context was the gene they are in, and zooming based on this size - this way you avoid being really zoomed into a big gene or still being very zoomed out from a small gene. (For example, in TB, esxU is only 318 bp and pks12 is 12,456 bp - 2,000 bp around a site in each is going to look very different. And at the moment for TB this is hard to adjust manually as the zoomed area is usually so small!)

For nucleotides, it seemed good to do some zoom, but I wasn't sure at what level, and I'd be happy to change this to be closer in (currently it pads with 10% genome size if possible).
You can zoom to a gene, then switch to nuc, and select nucs within view for a nice zoom level, but it's a bit round-about, I admit.

However, as I said, thinking about how people use or are likely to use clicking on or looking at variable AA and nuc sites will be most helpful in thinking what's best to display, and I'm not so familiar with what people are likely to want or do, so I'd be interested to hear how people think it'll be used. I think slightly surprising behaviour can be ok if it's more useful, but it's definitely good to find a balance.

@jameshadfield Yes I'm happy to take a look at this.

@rneher
Copy link
Member

rneher commented Aug 9, 2018

@jameshadfield anything preventing this from going live?

@jameshadfield
Copy link
Member

Not from my end -- there's plenty of tweaks & additions, but i'm happy if they are made in separate branches. I'll push up today if nobody objects.

@trvrb
Copy link
Member

trvrb commented Aug 9, 2018

I think this is a great basis and I'm happy to make adjustments on new branches. Thanks for all the work here @emmahodcroft.

Removes xMainOriginal, xGene, xGeneOriginal from this.scales, as well as this.xModified.
@jameshadfield jameshadfield merged commit f745402 into master Aug 9, 2018
@jameshadfield jameshadfield deleted the gene_zoom branch August 9, 2018 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants