Improve Communication Protocol #14

saulshanabrook · 2019-10-17T03:24:37Z

Currently, this extension uses Jupyter comms, which go over websocket, to send queries and data as the visualization is executing.

This "works," but it would be useful to make it easier to profile, debug, and switch out of Jupyter.

Profiling

For example, many queries are very slow. We need to be able to diagnose why that is. Is it data parsing and serialization? SQL response time? Ibis query creation?

To answer these properly, we should implement some form of "distributed tracing." Here is a New Relic UI that visualizes a trace of a request:

There is a W3 group "Distributed Tracing Working Group". They are working on a "Trace Context" spec.

Jaeger is a Cloud Native Computing Foundation project that exists today that says it will support this spec in the future:

My proposal is to try deploying Jaeger next to JupyterLab as a server extension and creating a frontend extension to display it's UI in JupyterLab. That way, we can visualize the queries we are executing as we interact with the graphs and inspect their performance. We will also have to instrument our Python library and pass certain tokens along with the request/responses to keep each request/response together.

Protocol Agnostic

The second issue here is that currently this mime renderer is very tied to being run in JupyterLab. We have to add a special case for running it in Phoilla so that it can access comms: vidartf/phoila#7

Over the past week, the idea of creating a mime render that needs to speak to a kernel but which you deploy outside of Jupyter has come up multiple times (cc @dharhas). To do this, we need to layer some standards on top of our current approach. A mime render on the client side should be a JS library that takes in a handle to a bidirectional async channel to the kernel. And on the server it should output some mimetype and also have a handle on this bidirectional communication channel.

The idea here is that you prototype in a jupyter notebook, but then you can extract out that cell from a notebook and run it on a non jupyter server, possibly embeded in some larger web app, and we should be able to run a non jupyter Python process on the backend. And hook up them up to each other to have bidrectional communication.

Honestly, I am not sure what we should use here. The requirements would ideally be:

Able to run over arbitrary transport protocol if you build your own backend. For example, we probably want to be able to run it through Jupyter's comms or over a REST API or a native websocket connection.
Be able to transmit raw bytes for efficiency when we need it, but besides that be agnostic to the payload
(optional) be able to integrate it with our tracing framework, so we can get some tracing out of the box.

gRPC seems like a good contender here, since it is also a Cloud Native Computing Foundation project and has good adoption. It's web story is just emerging, but there is at least one client with websocket support.

I don't know how Panel fits in here. I imagine they have implemented something here.

saulshanabrook · 2019-10-17T03:30:39Z

It looks like jaeger doesn't yet have a client side API: jaegertracing/jaeger-client-node#109 jaegertracing/jaeger#723 It's being developed here: https://github.com/jaegertracing/jaeger-client-javascript

As a workaround, I suppose we can setup a proxy server that is also running which we can hit from the frontend to send to the open tracing server.

saulshanabrook · 2019-10-17T04:47:48Z

I am experimenting with Jaeger and I notice currently I guess it isn't set up to show spans that haven't finished (jaegertracing/jaeger#729). This is rather too bad, because it would be nice to see a debugging view for a chart as you are interacting with it.

But maybe I should just make each interaction into separate spans. So there will be one initial span for setting it up, then another for each UI update.

vidartf · 2019-10-21T16:52:03Z

we should be able to run a non jupyter Python process on the backend. And hook up them up to each other to have bidrectional communication.

This sounds like you are going to reinvent the jupyter kernels + messaging protocol 😅 Why would it need to be non-jupyter?

saulshanabrook · 2019-10-21T16:54:21Z

@vidartf b/c jupyter is heavyweight! You might wanna back this by a simple flask server over REST or a websockets server. Not connected to a kernel, just backed by a regular python process.

It's possible we wanna re-use the jupyter comms spec and just create other backends for it, to allow it to be run without running the jupyter server. Or its possible we wanna back it another spec like gRPC and have jupyter comms be a backend for that.

vidartf · 2019-10-22T17:36:47Z

Not connected to a kernel, just backed by a regular python process.

I'm pretty sure this is how ipython started though 😉 More seriously, it would be interesting to hear which features would explicitly be included/excluded compared to the full jupyter_client + ipykernel + kernel manager/handlers case.

This was referenced Oct 17, 2019

Add tracing support #16

Merged

Allow opening comm connections vidartf/phoila#11

Merged

vidartf mentioned this issue Oct 21, 2019

Allow comm open messages voila-dashboards/voila#438

Merged

goanpeca added ibis-vega The project this belongs to type:enhancement Implement an improvement over a functionality status:backlog Work to be done labels Jan 22, 2020

rpekrul added omniscidb and removed omniscidb status:backlog Work to be done labels Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Communication Protocol #14

Improve Communication Protocol #14

saulshanabrook commented Oct 17, 2019

saulshanabrook commented Oct 17, 2019

saulshanabrook commented Oct 17, 2019

vidartf commented Oct 21, 2019

saulshanabrook commented Oct 21, 2019 •

edited

Loading

vidartf commented Oct 22, 2019

Improve Communication Protocol #14

Improve Communication Protocol #14

Comments

saulshanabrook commented Oct 17, 2019

Profiling

Protocol Agnostic

saulshanabrook commented Oct 17, 2019

saulshanabrook commented Oct 17, 2019

vidartf commented Oct 21, 2019

saulshanabrook commented Oct 21, 2019 • edited Loading

vidartf commented Oct 22, 2019

saulshanabrook commented Oct 21, 2019 •

edited

Loading