Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable 1D linking and Simultaneous Row Plotting in Mosviz #1790

Merged

Conversation

duytnguyendtn
Copy link
Collaborator

@duytnguyendtn duytnguyendtn commented Oct 28, 2022

Description

You may be asking yourself, why in {insert deity name of choice}'s name would you want that!? I think a picture is worth 1000 words (even if it's a picture of words :P)

image

You're reading that right; 200 targets, which used to take 1 hour and 11 minutes (!!) loaded in under two minutes.

This PR is part of a 3 part strategy in changing the linking strategy in Mosviz, developed with @camipacifici. In Mosviz, data navigation is mainly intended to be done with the Table Viewer. I recall us having discussions about removing the data dropdowns from Mosviz because of this. Though it may have been stopped due to a desire to plot multiple 1D spectrum in the same viewer.

After some profiling, I discovered supporting multi-plotting 1D spectrum was a VAST majority of the load time issues with Mosviz parser (in the GLASS dataset's case, 97.5% of the load time!). This is because the linking tree grows much faster than N for N number of targets, since we need to back link each target to each other one. This PR disables the intra 1D spectra linking to vastly improve loading times. It then hides other rows' data and disables the ability to load them simultaneously:

image

image

What about the multi-plotting spectrum? That's the last part of this 3-part strategy: (EDIT: Modified to include #1790 (comment))

  1. Disable 1D linking
  2. Investigate whether batch linking produces a similar benefit to removing linking entirely (Disable 1D linking and Simultaneous Row Plotting in Mosviz #1790 (comment))
  3. Re-enable the data dropdown with filtering to only the current row's data
  4. Benchmark the remaining two minutes to see whether lazy data loading could help
  5. If the above comes back true, investigate lazy data loading to further improve data loading/parsing time
  6. Investigate Dynamic linking of data, and re-enable intra-row/simultaneous-row plotting

I propose we investigate the possibility of dynamic linking of data. That being we only link data together IF and WHEN the user actually requests to show the data at the same time. Effectively, when the user clicks on the data dropdown and tries to add a second spectrum to the spectrum viewer, link AT THAT TIME, not in advance. That way, we only end up with links that we actually need. Overall, this will result in a link tree that is "sub-optimal" as compared to one created with "batch loading". But what we sacrifice in optimal linking, we gain in efficiency and parsing speed. I have no idea if this is possible, as it's just an idea I came up with much past my bedtime. But here we are!

I got the green light from @camipacifici to proceed with 1 and 2 now, and investigate 3 later. So I invite eyes and comments!

Change log entry

  • Is a change log needed? If yes, is it added to CHANGES.rst? If you want to avoid merge conflicts,
    list the proposed change log here for review and add to CHANGES.rst before merge. If no, maintainer
    should add a no-changelog-entry-needed label.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a milestone set? Set this to bugfix milestone if this is a bug fix and needs to be released ASAP; otherwise, set this to the next major release milestone.
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)?

@codecov
Copy link

codecov bot commented Oct 28, 2022

Codecov Report

Base: 88.10% // Head: 88.13% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (cfa9e14) compared to base (9e4acb8).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head cfa9e14 differs from pull request most recent head f85fdde. Consider uploading reports for the commit f85fdde to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1790      +/-   ##
==========================================
+ Coverage   88.10%   88.13%   +0.02%     
==========================================
  Files          95       95              
  Lines       10241    10254      +13     
==========================================
+ Hits         9023     9037      +14     
+ Misses       1218     1217       -1     
Impacted Files Coverage Δ
jdaviz/configs/mosviz/plugins/parsers.py 91.11% <ø> (-0.13%) ⬇️
jdaviz/app.py 94.25% <100.00%> (+0.03%) ⬆️
jdaviz/configs/cubeviz/helper.py 98.41% <0.00%> (+0.05%) ⬆️
jdaviz/configs/mosviz/helper.py 86.95% <0.00%> (+0.05%) ⬆️
jdaviz/configs/specviz/helper.py 58.13% <0.00%> (+0.18%) ⬆️
jdaviz/configs/specviz/plugins/parsers.py 90.52% <0.00%> (+1.50%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kecnry
Copy link
Member

kecnry commented Oct 28, 2022

I don't think disabling the data menu entirely is the way to go (won't be able to toggle the visibility of subsets or models, etc)... what if we instead skipped entries in the data menu from other rows? Even then, that would disable the use-case of comparing spectra from different targets - do we want to do that?

@pllim pllim added this to the 3.2 milestone Oct 28, 2022
@duytnguyendtn
Copy link
Collaborator Author

duytnguyendtn commented Oct 28, 2022

Even then, that would disable the use-case of comparing spectra from different targets - do we want to do that?

Sorry for not being clear. This is what I meant by "multi-plotting spectra." This is exactly what Cami and I are proposing right now, since without 1D linking, this isn't possible. Cami and I agreed the "flipping through the table viewer" use case is probably more likely than the desire to compare spectra between targets, so we should remove the ability to do that from the UI, at least temporarily until we can figure out a way to allow it without crippling parse times.

Good point on preserving the data dropdown for models and other "non data artifacts." I agree with your strategy of trying to remove entries from other rows; I'll investigate

@duytnguyendtn
Copy link
Collaborator Author

Thanks team for the feedback this morning! Here's a summary of next steps:

  1. Investigate whether batch linking produces a similar benefit to removing linking entirely
  2. Re-enable the data dropdown with filtering to only the current row's data
  3. Benchmark the remaining two minutes to see whether lazy data loading could help
  4. If the above comes back true, investigate lazy data loading to further improve data loading/parsing time

@duytnguyendtn
Copy link
Collaborator Author

Ok so after a thorough archaeological expedition, I wanted to loop back on the first point:

  1. Investigate whether batch linking produces a similar benefit to removing linking entirely

I swore I remembered at the time that this was already implemented, but I wanted to be sure before I embarrassed myself. So in #762 (comment), @ojustino does a good job describing the current strategy. Let me try to describe what I understand of the linking logic at this point:

The linking logic is already self contained within an entirely separate parser:

@data_parser_registry("mosviz-link-data")
def link_data_in_table(app, data_obj=None):

Linking is already deferred in each of the other parsers for the image, 1D, and 2D data. Only after those parsers have completed their work, does this link parser step in.

In this linking parser, we already employ with app.data_collection.delay_link_manager_update(): to stop the automatic linking. We instead define the links manually by linking each 1D spectra to the first spectrum with a series of LinkSame's. NOTICE: We aren't defining all the interlinks from each spectra to each other, only to the first spectrum. Also note that we aren't officially registering the link JUST YET. The logic appends all the LinkSame's into a list containing all the pending links to be registered, and then ONLY AT THE END, applies all of them in a single add_link() call. It's only when we add the links at the end to glue that the rest of the links are discovered and applied.

So this mosviz-link-parser I think is already our attempt to batch link. Searching through the glue docs, I can't find any other interpretation of "batch linking." Considering Tom's approval in #762 (comment), I think this produces the most efficient linking tree possible. I tried searching for some glue linking mechanism to provide an array of IDs and tell Glue "These are all the same thing," rather than creating separate LinkSame's in a loop, but I couldn't find any. I think we're at the point to acknowledge we just have a lot of data to link and now need to reconsider which links we want to intentionally make, rather than blind linking everything

Please let me know if I'm misinterpreting "batch linking" and someone has a secret that I've been out of the loop on!

@pllim
Copy link
Contributor

pllim commented Oct 28, 2022

Looks like batch linking to me...

# Optimize linking speed through a) delaying link manager updates with a
# context manager, b) handling intra-row linkage of 1D and 2D spectra in a
# loop, and c) handling inter-row linkage after that in one fell swoop.
with app.data_collection.delay_link_manager_update():

@duytnguyendtn duytnguyendtn force-pushed the mo_linking_mo_problems branch from 8690ac2 to 668a2bb Compare November 2, 2022 19:19
@duytnguyendtn duytnguyendtn changed the title Disable data dropdowns and 1D linking in Mosviz Disable 1D linking and Simultaneous Row Plotting in Mosviz Nov 3, 2022
@javerbukh
Copy link
Contributor

javerbukh commented Nov 4, 2022

I'm testing the NIRISS example notebook and getting the following error when clicking a different row, anyone else? Traceback ends with:

File ~/Documents/jdaviz_dev/jdaviz/jdaviz/app.py:1031, in Application.add_data_to_viewer(self, viewer_reference, data_label, visible, clear_other_data)
   1013 """
   1014 Plots a data set from the data collection in the specific viewer.
   1015 
   (...)
   1027     defined data set.
   1028 """
   1029 if 'mosviz_row' in self.state.settings:
   1030     if not ((self.get_viewer("table-viewer").row_selection_in_progress) and
-> 1031             (self.data_collection[data_label].meta['mosviz_row'] !=
   1032                 self.state.settings['mosviz_row'])):
   1033         raise NotImplementedError("Intra-row plotting not supported. "
   1034                                   "Please use the table viewer to change rows")
   1036 viewer_item = self._get_viewer_item(viewer_reference)

KeyError: 'mosviz_row'

@duytnguyendtn
Copy link
Collaborator Author

Fixed row selection in 4012700! You'll still see this traceback in NIRISS datasets when manually calling add_data_to_viewer with a different row's spectra due to #1805. But I think this can be handled as a separate bug

@javerbukh
Copy link
Contributor

Works well! I did want to ask if one interaction behaves as expected: if I use gaussian smooth on a 1d spectrum, new data is added called "... smooth stddev x". If I make this data invisible or remove it from the viewer, then click a different row, then return to the first row, that "... smooth stddev x" data is now visible and in the viewer. Is this the behavior we would expect? If not, is it a big enough issue to handle in this PR or instead should we create a new issue for follow up? If it's a simple enough fix I think we should handle it in this PR but I'm interested to hear thoughts from the team and @camipacifici.

@duytnguyendtn
Copy link
Collaborator Author

@javerbukh Great eyes! That's yet another bug that Cami and I have been talking about, and plan to address. I was able to reproduce this error in main so it doesn't appear connected to the changes here

@duytnguyendtn
Copy link
Collaborator Author

duytnguyendtn commented Nov 4, 2022

That particular bug is #1816. I don't immediately know where this behavior is coming from, so it will require at least a little investigative work. I'd advocate for keeping it a separate effort

Copy link
Contributor

@javerbukh javerbukh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duytnguyendtn Ah ok, in that case I think this is good to get in. Nice work!

CHANGES.rst Outdated Show resolved Hide resolved
jdaviz/app.py Outdated Show resolved Hide resolved
jdaviz/configs/mosviz/tests/test_data_loading.py Outdated Show resolved Hide resolved
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Since @kecnry commented before, do you want an approval from him as well?

@duytnguyendtn
Copy link
Collaborator Author

@kecnry gave me a silent approval over slack, just couldn't do a full review. I'm happy to move forward here! Thanks Kyle, Jesse, Pey-Lian and Cami for your efforts/thoughts through this PR!

@duytnguyendtn duytnguyendtn merged commit 60fc1bd into spacetelescope:main Nov 4, 2022
@duytnguyendtn duytnguyendtn deleted the mo_linking_mo_problems branch November 4, 2022 19:13
@kecnry
Copy link
Member

kecnry commented Nov 7, 2022

If I make this data invisible or remove it from the viewer, then click a different row, then return to the first row, that "... smooth stddev x" data is now visible and in the viewer.

I don't immediately know where this behavior is coming from, so it will require at least a little investigative work.

The row changing removes/adds data to the viewer. When data is removed from the viewer, all associated layers are destroyed/forgotten, so if we want any of the layer information to be restored when returning to the row, we will need to store that somewhere. This same thing happens if you manually remove and add the same data entry into a viewer (all layer options are lost) which was the motivation for changing the check box in the data menu to control visibility instead of loaded-ness... but that solution obviously won't work for mosviz, so if the expected behavior is that visibility and/or layer state persists, we'll need to write that logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants