Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot is very slow to display large array via overlay_array #44

Closed
nispio opened this issue Mar 28, 2019 · 19 comments
Closed

Plot is very slow to display large array via overlay_array #44

nispio opened this issue Mar 28, 2019 · 19 comments
Labels
enhancement New feature or request performance
Milestone

Comments

@nispio
Copy link

nispio commented Mar 28, 2019

When I use overlay_array to plot a large array (e.g. 20M points) it takes a significant amount of time (i.e. over one minute) for the data to be displayed in the widget. This is basically what my notebook looks like:

In[1]:

from jupyter_sigplot.sigplot import SigPlot
import numpy as np

x = np.arange(0, 200e3, 0.01)
sinx = np.sin(x)
cosx = np.cos(x)

In[2]:

plot = SigPlot(sinx, cosx, xi=True, cross=True, all=True)
plot.plot()

In[4]:

# After writing sinx and cosx to Type 1000 'SF' files ...
plot = SigPlot("sinx.tmp", "cosx.tmp", xi=True, cross=True, all=True)
plot.plot()

When I evaluate the second cell, a blank plot is displayed right away, but the data takes over a minute to show up, and there is no indication that any background work is taking place.

When I evaluate the fourth cell, the full plot is displayed almost instantly, including the data.

@nispio
Copy link
Author

nispio commented Mar 29, 2019

I tested this at home, and it worked significantly faster. A 200K-element array only took a few seconds to plot. The system that is running slow is actually an OpenStack VM, but it has 8 cores and 15 GB of RAM, both of which are greater than my home workstation. The OpenStack instance is running a hardened CentOS7, and my home machine is running Ubuntu 18.04.

Any suggestions on what and how to test for bottlenecks?

@sterre
Copy link
Contributor

sterre commented Mar 29, 2019

I know that Websockets in general are capable of transmitting binary data. If I get time, I'd like to look into whether the Jupyter Comm abstraction has somewhere we could hook into for binary transport instead of JSON.

@maihde
Copy link

maihde commented Mar 29, 2019

@nispio if you can, open up the Chrome Developer tools and make a HAR file capture of the slow plot.

(https://community.box.com/t5/Managing-Content-Troubleshooting/How-to-Generate-a-HAR-File-in-Chrome-IE-Firefox-and-Safari/ta-p/366)

Then send me the HAR file privately or attached to this commit. This will help us understand where the slow down is coming from.

@mrecachinas
Copy link
Contributor

mrecachinas commented Mar 30, 2019

Also @nispio what version of Python, browser, and browser version are you using? (Even though we are fairly confident it's the JSON serialization, it would still be useful to have this environment information.)

@maihde
Copy link

maihde commented Mar 30, 2019

@nispio I'm able to reproduce this behavior as well. In my case the slow down is definitely on the Python side and is part of the Jupyter COMM subsystem which uses Traitlets to synchronize the arrays from the backend engine to the front end.

From what I can tell so far, it's either: (a) the serialization of the array to JSON or (b) the Traitlet comparison of objects.

My environment is:

  • Ubuntu Bionic (running in a VirtualBox VM)
  • Chrome 73.0.3683.86
  • Python 3

@maihde
Copy link

maihde commented Mar 30, 2019

Here is a link that shows a technique that we might be able to use to transfer binary data instead of Json (https://gist.github.com/maartenbreddels/40fa030fdb922e6d2074282ceed6b753).

@mrecachinas
Copy link
Contributor

mrecachinas commented Mar 31, 2019

Profiling

In [6]: %%prun -s cumulative -q -T prun1
        plot = SigPlot(sinx, cosx, xi=True, cross=True, all=True)
        plot.plot()
$ cat prun1

         900015180 function calls (840013881 primitive calls) in 1751.088 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002 1751.088 1751.088 <string>:3(<module>)
        1    0.008    0.008 1740.093 1740.093 sigplot.py:163(plot)
    20/18    0.004    0.000 1740.046   96.669 traitlets.py:576(__set__)
    22/20    0.001    0.000 1740.042   87.002 traitlets.py:558(set)
        2    0.008    0.004 1740.017  870.008 sigplot.py:89(_show_array_internal)
    15/13    0.002    0.000 1740.010  133.847 traitlets.py:1133(_notify_trait)
        7    0.027    0.004 1739.994  248.571 widget.py:590(notify_change)
        4    0.016    0.004 1739.938  434.984 widget.py:472(send_state)
        4    0.011    0.003 1447.499  361.875 widget.py:726(_send)
        4   18.637    4.659 1447.487  361.872 comm.py:118(send)
        6    0.032    0.005 1428.866  238.144 comm.py:61(_publish_msg)
60000118/13  312.539    0.000 1028.279   79.098 jsonutil.py:109(json_clean)
300003170  308.202    0.000  854.974    0.000 {isinstance}
120000225  330.436    0.000  546.772    0.000 abc.py:128(__instancecheck__)
        7    0.024    0.003  400.547   57.221 session.py:657(send)
        7    0.020    0.003  400.421   57.203 session.py:598(serialize)
       28    0.006    0.000  390.673   13.953 session.py:102(<lambda>)
       28    0.298    0.011  390.667   13.952 jsonapi.py:31(dumps)
       28   12.522    0.447  377.000   13.464 __init__.py:193(dumps)
       28    0.012    0.000  364.469   13.017 encoder.py:186(encode)
...

The full file is attached: prun1.txt

It's sorted by cumulative time, so naturally the outer functions took the longest, but tracing down, it does look like it spent most of the time in the JSON serialization.

Options:

There are a few options @maihde and I have been discussing, and this ties in nicely with @desean1625's comments on Slack:

  1. Pass raw binary over Jupyter COMM following the link @maihde posted above.
  2. Follow a technique similar to Matplotlib. Matplotlib passes the base64 encoded PNG data of the plot to the client, and when the user selects an area to zoom into, the client (JS) sends a "resize" message to the server, which regenerates an image and sends it back to the client to display. See: matplotlib/mpl.js#L299-L303, matplotlib/mpl.js#L399-448, and backend_nbagg.py. This is also how Matplotlib is able to retain plots on close -- because the png data is kept within the notebook itself and no JS needs to be executed (tying in nicely with Save rendered SigPlot images with the notebook #26).
  3. Write out a file to a temp directory and serve the file to the client via overlay_href. The upside is we already know this is fast. The downside is SigPlot.js does not yet support plotting a detached header file, so we would need to write it out as a proper T1000 or T2000 file.

Right now option (3) is the most feasible and simplest, but (2) has been discussed before for high network latency (or low network throughput) situations, and keeps coming up in similar discussions for SigPlot (e.g. @desean1625's Slack comments), and clearly reaches into design decisions with SigPlot (e.g., XCMP) and the saving plots in Jupyter-SigPlot. Option (1) is a nice intermediary that should be explored as well because we all agree serializing to JSON is not the optimal solution performance-wise, and (1) will need to be implemented before we can implement (2).

@mrecachinas mrecachinas added enhancement New feature or request performance labels Mar 31, 2019
@sterre
Copy link
Contributor

sterre commented Apr 1, 2019

One difference between Matplotlib and SigPlot is that Matplotlib actually does all the rendering on the server (kernel), whereas SigPlot, being native Javascript, can (and currently does) render on the client.

Another is that the %matplotlib inline version of Matplotlib uses only a rich representation, not a widget. (%matplotlib notebook is different)

When we've been talking about saving rendered images with the notebook, my understanding is that we were mainly thinking the client would generate a PNG after rendering, and send that back to the server to store as a rich representation, allowing saved notebooks to be viewed with embedded figures via things like nbviewer. This is the same shape as Matplotlib, but in reverse. Just wanted to double-check that we're talking about the same thing

Just as I think Bokeh is also worth investigating in the context of saving, it may also be worth a look in the context of Comms. (I don't know/remember exactly what it does; in some cases, I'm certain that it sends a Javascript representation of the figure in question, which is definitely no better than JSON.)

Regarding the binary transport for SigPlot, I think it would have the additional virtue that SigPlot already internally traffics in ArrayBuffers (iirc), so there'd be one less translation client side, over and above the deserialization savings.

On yet another point, even if we use a binary transport, large files may take a while to arrive at the server. Some kind of progress indicator would be nice to address the OP's observation that nothing obvious occurs during loading--but note that the progress in question is in the Comm, not SigPlot proper. Imagine that SigPlot grows a way to indicate loading progress, possibly updated by XHR events in a purely Web context. Then I suggest that we have the notebook widget trigger that same mechanism, rather than also adding UI to display notebook Comm progress (presuming that we can hook into Comm progress in the first place).

And finally, on the detached header point: I bet it would not be too much work to add an "overlay href data only" capability, since we already have headermod / header-only push and the expression in the constructor is "data" plus "overrides" (header). In the notebook context, we could keep the header in a traitlet, allowing a nice Python syntax for headermod like h.xstart = 123 (triggering an observer in the client, which would then call SigPlot[.layer].headermod on the client. Even without that syntax, keeping the header in memory and the data on disk could be virtuous in terms of disk access patterns.

@nispio
Copy link
Author

nispio commented Apr 1, 2019

A quick correction/update:

I incorrectly stated before that I was plotting 200,000 points, but I later realized (to my embarrassment) that arange(0, 200e3, 0.01) creates 20 million points. However, when I was testing at home, I was coding up the test notebook from scratch, and I used r_[:200e3]*0.01 without realizing that I was not comparing apples to apples. Plotting 20 million points at home gave performance that was similar to those described in the problem case on OpenStack.

The two takeaways from this correction:

  1. The minute-plus array-plotting times were occurring for 20M points (not 200k as first stated)
  2. The performance difference between my two setups was not nearly as drastic as I initially thought.

@maihde
Copy link

maihde commented Apr 1, 2019 via email

@mrecachinas
Copy link
Contributor

mrecachinas commented Apr 1, 2019

@sterre

One difference between Matplotlib and SigPlot is that Matplotlib actually does all the rendering on the server (kernel), whereas SigPlot, being native Javascript, can (and currently does) render on the client.

Yep. There are instances where we could benefit from server-side rendering or at least cleverly sending what can render on the screen (xcmp-style) (e.g., attempting to plot >100 MB of data).

Another is that the %matplotlib inline version of Matplotlib uses only a rich representation, not a widget. (%matplotlib notebook is different)

That's correct. They both are technically widgets---the only difference is %matplotlib notebook leverages the websocket connection to send messages back to the kernel (e.g., resize and pan), so the kernel can re-render the plot and send that PNG back to the client.

When we've been talking about saving rendered images with the notebook, my understanding is that we were mainly thinking the client would generate a PNG after rendering, and send that back to the server to store as a rich representation, allowing saved notebooks to be viewed with embedded figures via things like nbviewer. This is the same shape as Matplotlib, but in reverse. Just wanted to double-check that we're talking about the same thing.

Yep. In general, sending pre-rendered images will be easier to persist. I was merely noting that (2) would enable us to do this.

Just as I think Bokeh is also worth investigating in the context of saving, it may also be worth a look in the context of Comms. (I don't know/remember exactly what it does; in some cases, I'm certain that it sends a Javascript representation of the figure in question, which is definitely no better than JSON.)

Bokeh appears to have the same issues we have. The following code both took >10 minutes to eventually crash my (Chrome v72) browser every time I ran it:

In[1]: from bokeh.plotting import figure, show, output_notebook
       import numpy as np
       output_notebook()

In[2]: x = np.arange(0, 200e3, 0.01)
       sinx = np.sin(x)
       cosx = np.cos(x)

In[3]: %%timeit
       p = figure(width=300, height=300)
       p.multi_line(xs=[x, x], ys=[sinx, cosx])
       show(p)

Regarding the binary transport for SigPlot, I think it would have the additional virtue that SigPlot already internally traffics in ArrayBuffers (iirc), so there'd be one less translation client side, over and above the deserialization savings.

Exactly.

On yet another point, even if we use a binary transport, large files may take a while to arrive at the server. Some kind of progress indicator would be nice to address the OP's observation that nothing obvious occurs during loading--but note that the progress in question is in the Comm, not SigPlot proper. Imagine that SigPlot grows a way to indicate loading progress, possibly updated by XHR events in a purely Web context. Then I suggest that we have the notebook widget trigger that same mechanism, rather than also adding UI to display notebook Comm progress (presuming that we can hook into Comm progress in the first place).

Yep, we could do something like (not-tested)

from __future__ import division
import requests

url = ...

# `stream=True` lets us stream over the response
r = requests.get(url, stream=True)

# get the total file size
total_size = int(r.headers.get('content-length', 0));

# we'll want to iterate over the file by chunks
block_size = 1024

# how much we've written locally (kernel-side)
wrote = 0

# "stream" the remote asset to ``local_file``
with open(local_file, 'wb') as f:
    for data in r.iter_content(block_size):
        # keep track of how much we've written
        f.write(data)
        wrote += len(data)

        # update the ``progress`` traitlet, which
        # we will handle on the client-side in some loading
        # notification (e.g., loading bar via tqdm?, spinny wheel, etc.)
        progress = wrote / total_size

And finally, on the detached header point: I bet it would not be too much work to add an "overlay href data only" capability, since we already have headermod / header-only push and the expression in the constructor is "data" plus "overrides" (header).

Agreed that it wouldn't be difficult.

In the notebook context, we could keep the header in a traitlet, allowing a nice Python syntax for headermod like h.xstart = 123 (triggering an observer in the client, which would then call SigPlot[.layer].headermod on the client. Even without that syntax, keeping the header in memory and the data on disk could be virtuous in terms of disk access patterns.

That seems reasonable.

@mrecachinas
Copy link
Contributor

mrecachinas commented Apr 1, 2019

@maihde See the profiling output I posted above. It doesn't look like the json.dumps piece of serializing was the offender

28    0.298    0.011  390.667   13.952 jsonapi.py:31(dumps)

but ipykernel's jsonutil.py's json_clean, which massages types into JSON-compatible types (e.g., sets and tuples to lists), appears to have taken quite a bit of time (over multiple calls):

60000118/13  312.539    0.000 1028.279   79.098 jsonutil.py:109(json_clean)

I ran the following

import json
import random
import time
from ipykernel import jsonutil

npoints = int(20e6)
data = []
for _ in range(npoints):
    data.append(random.random())
s = time.time()

# what I added
data = jsonutil.json_clean(data)

dat = json.dumps(data)
e = time.time()
print "serialization took", e-s

which produced

serialization took 112.765253067

The combination of json_clean running 13 times (~1000 seconds) and json.dumps running 28 times (~390 seconds) accounts for ~1400 seconds of the 1700 (which is roughly comm.py:61(_publish_msg)). The rest ~300 seconds can be explained by widget.py:117(_remove_buffers).

So why does Jupyter call all these methods multiple times? Still tracking that down...

@desean1625
Copy link

@mrecachinas

The downside is SigPlot.js does not yet support plotting a detached header file, so we would need to write it out as a proper T1000 or T2000 file

This should plot a detached header file.

var container = document.getElementById('plot');
var plot = new sigplot.Plot(container, {});
fetch("./path_to_detached_data").then(function(response) {
  return response.arrayBuffer();
}).then(function(buffer) {
  var bf = sigplot.m.initialize();
  bf.setData(buffer);
  plot.overlay_bluefile(bf);
})

@sterre
Copy link
Contributor

sterre commented Apr 1, 2019

Given that we control the horizontal and the vertical in this instance, it seems like we'd be able to get a good improvement just by implementing our own serialize/deserialize to bypass json_clean! I still like the ArrayBuffer approach better overall, of course.

FWIW, if we replace the data in the "big data" fiddle with Math.random() instead of a ramp, the full range of data takes more like 10 sec to render (compared with ~2 sec for the ramp).

@mrecachinas
Copy link
Contributor

mrecachinas commented Apr 1, 2019

@maihde @sterre This also could be driving the slowness...

...
            self.array_obj = array_obj # <-- one sync
            self.arrays.append(self.array_obj)
            self.oldArrays = self.arrays # <-- another sync
...

We're incurring two syncs (and therefore json serializations) each time we plot (i.e., overlay_array).

@desean1625 Perfect. Thanks!

@maihde
Copy link

maihde commented Apr 1, 2019 via email

@maihde
Copy link

maihde commented Apr 2, 2019 via email

@sterre
Copy link
Contributor

sterre commented Apr 2, 2019

I wouldn't be surprised if the json_clean function that @mrecachinas found, plus the multiple assignments, are the source of the additional time.

oldArrays is used for two purposes that I see:

  • To stash arrays that are overlaid before the client half of the widget has cosntructed / rendered
  • To avoid overlaying the same array multiple times

array_obj is used to trigger the client's overlay after the client half of the widget is rendered.

This is already a little confusing because the timing aspect of it is not documented.

I can think of a couple of ideas for reducing the number of syncs as well as clarifying the intent a bit. One approach might be to just remember, in the client, what we've already overlaid (e.g., by index into oldArrays). We could dispense with array_obj, removing one sync. Then whenever the list of overlaid arrays changes, the client, not the server, decides what's new. I suspect this will help clear up behavior when multiple arrays are overlaid in one call.

However, because we're syncing the entire history of all arrays that have ever been overlaid, we're doing the most possible work we can whenever we add a new array--we'll have to re-serialize all previously-overlaid arrays! So we might be better off using our traitlet only to communicate new arrays, and devise a scheme where the client retains what it's received historically in local variables rather than traitlets. We may want a unique id (UID), like a one-up-counter, for each array that we overlay; then the client could just ack those UIDs with a quick traitlet update (which might even UIDs for all arrays currently being plotted).

All this touches somewhat on the idea of sending more generic commands, like tuples ('command', [args]) . This would unify the array/href/file transport. We'd want to give every command a UID, rather than every array/href, but the logic would otherwise be essentially the same as above.

@mrecachinas mrecachinas mentioned this issue Apr 6, 2019
15 tasks
@mrecachinas mrecachinas added this to the 0.2.0 milestone Apr 21, 2019
mrecachinas pushed a commit that referenced this issue Apr 22, 2019
Addresses #13, #36, #44, and indirectly #11. This changes the name from `sigplot.SigPlot` to `sigplot.Plot` and modifies the API to match the JS. 

This also includes adding a traitlet for progress when downloading a remote asset, but the visual component (progress bar, incrementing counter, etc.) hasn't been figured out yet.
@mrecachinas
Copy link
Contributor

Optimized in 1d44f0d. It uses binary serialization/deserialization for the array rather than just JSON array serialization/deserialization. It was much much faster for the 20 million point case. If it still is an issue, feel free to open this or a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

5 participants