Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cubeviz slider performance improvements #1550

Merged
merged 8 commits into from
Aug 9, 2022

Conversation

duytnguyendtn
Copy link
Collaborator

@duytnguyendtn duytnguyendtn commented Aug 8, 2022

Description

This PR attempts to improve the performance of the cubeviz slider by using the helper rather than passing messages around.

Checklist for package maintainer(s)

This checklist is meant to remind the package maintainer(s) who will review this pull request of some common things to look for. This list is not exhaustive.

  • Are two approvals required? Branch protection rule does not check for the second approval. If a second approval is not necessary, please apply the trivial label.
  • Do the proposed changes actually accomplish desired goals? Also manually run the affected example notebooks, if necessary.
  • Do the proposed changes follow the STScI Style Guides?
  • Are tests added/updated as required? If so, do they follow the STScI Style Guides?
  • Are docs added/updated as required? If so, do they follow the STScI Style Guides?
  • Did the CI pass? If not, are the failures related?
  • Is a change log needed? If yes, is it added to CHANGES.rst?
  • Is a milestone set?
  • After merge, any internal documentations need updating (e.g., JIRA, Innerspace)?

@duytnguyendtn duytnguyendtn added this to the 2.9 milestone Aug 8, 2022
@duytnguyendtn duytnguyendtn added no-changelog-entry-needed changelog bot directive and removed cubeviz labels Aug 8, 2022
@pllim
Copy link
Contributor

pllim commented Aug 8, 2022

Did you benchmark the improvement?

@codecov
Copy link

codecov bot commented Aug 8, 2022

Codecov Report

Merging #1550 (1a66a10) into main (c523cf6) will decrease coverage by 0.04%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##             main    #1550      +/-   ##
==========================================
- Coverage   85.48%   85.44%   -0.05%     
==========================================
  Files          93       94       +1     
  Lines        8749     9054     +305     
==========================================
+ Hits         7479     7736     +257     
- Misses       1270     1318      +48     
Impacted Files Coverage Δ
jdaviz/core/events.py 93.44% <ø> (+1.33%) ⬆️
jdaviz/configs/cubeviz/plugins/tools.py 89.23% <50.00%> (+1.35%) ⬆️
jdaviz/configs/cubeviz/helper.py 96.07% <100.00%> (+1.63%) ⬆️
jdaviz/configs/specviz2d/plugins/parsers.py 35.48% <0.00%> (-51.62%) ⬇️
jdaviz/app.py 91.79% <0.00%> (-0.60%) ⬇️
...igs/specviz/plugins/line_analysis/line_analysis.py 97.68% <0.00%> (-0.34%) ⬇️
jdaviz/core/template_mixin.py 91.25% <0.00%> (-0.10%) ⬇️
jdaviz/configs/specviz2d/plugins/__init__.py 100.00% <0.00%> (ø)
...plugins/spectral_extraction/spectral_extraction.py 88.23% <0.00%> (ø)
...nfigs/default/plugins/plot_options/plot_options.py 99.21% <0.00%> (+0.03%) ⬆️
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@duytnguyendtn
Copy link
Collaborator Author

Just about to mark this PR as ready for review; here's all I've learned from profiling Cubeviz:

There is a divide here in where I was able to profile. The front-end part which I couldn't find a way to profile details going from actual mouse click to the underlying Jdaviz code. The best way to detail this section I think came from Kyle: "This is where, for example, the browser has to wait a moment to determine if the user has double-clicked." The profiling I was able to do comes from after receiving the instruction to change wavelength, to actually changing the wavelength

So after testing the two techniques for selecting the wavelength, the time it took to select 200 random wavelengths dropped from 53.577 seconds to 51.470 seconds, or an improvement of 4%. We're really scratching the bottom of the barrel here without touching the actual strategy itself (which would be more than 3 points of effort 😅.

Attached below are the top calls from the profiler for 100 runs:

         4212627 function calls (4180466 primitive calls) in 28.430 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9808    9.218    0.001    9.228    0.001 {built-in method numpy.array}
      100    2.946    0.029   19.719    0.197 array.py:431(compute_statistic)
     6868    2.903    0.000    2.903    0.000 {method 'reduce' of 'numpy.ufunc' objects}
      100    2.121    0.021   10.511    0.105 array.py:399(nansum_with_nan_for_empty)
     2730    1.448    0.001    1.448    0.001 _methods.py:106(_clip_dep_invoke_with_casting)
      100    1.443    0.014    4.927    0.049 nanfunctions.py:68(_replace_nan)
26404/15366    1.439    0.000   11.116    0.001 {built-in method numpy.core._multiarray_umath.implement_array_function}

What's actually taking so long is the process of converting the wavelength input to an actual indexed-slice. To do this, select_wavelength needs to grab the whole spectral_axis and find the closest wavelength to the selected value. But to do this, it has to grab the full data and therefore create a Spectrum1D EACH TIME this is called. The true killer here is that it isn't just grabbing the data from disk; it must actually COMPUTE the data, since the spectrum is autogenerated/autocollapsed. If it matters, the specific thing that's taking so long is glue needing to compute the statistic (the auto-collapsed spectrum) each time, but I think the issue is more our current strategy to begin with. I don't immediately know how to solve this, but I've already exhausted the 3 points on this ticket, so I'd recommend starting here for more improvements next time. Possible suggestions:

  1. Cache the Spectrum1D, or maybe even the spectral axis, and return it each time it's requested. This might be the easiest
  2. Find some way to calculate the spectral_axis without needing to generate the flux. Since the actual computation of the statistic isn't really necessary here, it could be theoretically skipped (we only need the x-axis, not the y-axis)

@duytnguyendtn duytnguyendtn marked this pull request as ready for review August 9, 2022 14:11
@kecnry
Copy link
Member

kecnry commented Aug 9, 2022

@duytnguyendtn - that is very useful investigative work! I agree that caching might be quite useful, but I'm just curious if accessing the marks object for the displayed spectrum would be sufficiently faster. Something like the following in select_wavelength (which assumes that the first Line - not subclass of Line - entry corresponds to the reference data):

x_all = [m for m in sv.figure.marks if m.__class__.__name__ in ['Lines', 'LinesGL']][0].x

instead of the existing

x_all = self.app.get_viewer('spectrum-viewer').data()[0].spectral_axis.value

@duytnguyendtn
Copy link
Collaborator Author

@kecnry You're a genius!!!

         3682352 function calls (3654486 primitive calls) in 7.079 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2580    1.425    0.001    1.425    0.001 _methods.py:106(_clip_dep_invoke_with_casting)
      516    0.617    0.001    3.701    0.007 composite_array.py:78(__call__)
      258    0.263    0.001    0.265    0.001 component.py:82(__getitem__)
     5399    0.252    0.000    0.252    0.000 {built-in method builtins.dir}
      516    0.251    0.000    0.251    0.000 {method 'take' of 'numpy.ndarray' objects}
22390/12242    0.230    0.000    2.190    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
   591379    0.228    0.000    0.425    0.000 {built-in method builtins.getattr}

@pllim
Copy link
Contributor

pllim commented Aug 9, 2022

Can we replace this

x_all = self.app.get_viewer('spectrum-viewer').data()[0].spectral_axis.value
index = np.argmin(abs(x_all - wavelength))
return self.select_slice(int(index))

with this?

wavelength = wavelength * wave_unit  # We store the unit somewhere, right?
index = self.app.data_collection[0].coords.spectral_wcs.world_to_pixel(wavelength)  # Assume [0] is always FLUX
return self.select_slice(int(index))

The WCS is already there. The only thing I am not sure about is whether spectral_wcs is always available as a property. It is not specified in APE 14. If not, we have to find a different way to query that WCS, but the point is you don't need to collapse any spectrum to begin with.

Copy link
Member

@kecnry kecnry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticeably snappier on the default cube! Thanks for tracking this down!

@pllim
Copy link
Contributor

pllim commented Aug 9, 2022

Ah, I just saw Kyle had the same idea but different solution. I guess that works too if we can trust the marks.

@@ -41,8 +41,7 @@ def on_mouse_event(self, data):
# throttle to 200ms
return

msg = SliceSelectWavelengthMessage(wavelength=data['domain']['x'], sender=self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need SliceSelectWavelengthMessage with this removal? Should we remove this message from the code base altogether?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kecnry , any reason we still need to listen to SliceSelectWavelengthMessage in app at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly looking through the code, and it looks like the slider was the only thing that used it? I'll remove it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, if its not used anywhere else, let's get rid of it here.

@kecnry
Copy link
Member

kecnry commented Aug 9, 2022

I think since we don't allow unloading the reference data from the UI (technically the user could manually remove it from the API), then the marks should be reliable. But trying both and seeing which performs better never hurts! For the marks approach, it might eventually be worth caching the marks entry so we don't have to do that ugly loop to find it though...

@@ -85,7 +85,10 @@ def select_wavelength(self, wavelength):
wavelength = float(wavelength.wavelength)
if not isinstance(wavelength, (int, float)):
raise TypeError("wavelength must be a float or int")
x_all = self.app.get_viewer('spectrum-viewer').data()[0].spectral_axis.value
# Retrieve the x slices from the spectrum viewer's marks
x_all = [m for m in self.app.get_viewer('spectrum-viewer').figure.marks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with marks. Is it always in the unit we think it is in, especially with all the unit conversion going on via plugin, etc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the marks are in the plotted units, the click/drag even is guaranteed to be in plotted units, and the select_wavelength method says that it assumes the user input for wavelength is provided in plotted units, so I think we're safe (for now).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the marks will always be in the same units of the spectrum viewer, since they are the literal marks that are plotted on screen. When using the slice tool, it will naturally request the wavelength in units of the viewer it's acting on, so I think that should be consistent? @kecnry is that a fair statement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's fair. The only possible problem I can see would be when the user calls the API manually, but the API docs state that no unit conversion is done, so I think that can be kicked down the road for the unit conversion refactor (if at all)

@pllim
Copy link
Contributor

pllim commented Aug 9, 2022

This is basically a performance bug fix, so I think we need a change log.

@pllim pllim added bug Something isn't working cubeviz performance Performance related and removed no-changelog-entry-needed changelog bot directive labels Aug 9, 2022
CHANGES.rst Outdated Show resolved Hide resolved
jdaviz/core/events.py Outdated Show resolved Hide resolved
Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@pllim pllim merged commit aa86ecb into spacetelescope:main Aug 9, 2022
@duytnguyendtn duytnguyendtn deleted the slideperf branch August 9, 2022 20:05
@pllim pllim mentioned this pull request Aug 16, 2022
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cubeviz performance Performance related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants