-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Async Client / Kernel Manager #266
Comments
Besides the two network hops (websockets, zmq), we deserialise, reserialise, deserialise the message, and do a fair bit of manipulation in Python and Javascript. So I'm not sure whether there's one big cause of the slowness you see or dozens of small ones.
I think that's deceptive. Python doesn't do much optimisation, but one thing it does do is constant folding, so the bytecode for your code is equivalent to
|
I guess we could use an Echo Kernel to actually do performance measurement on jupyter_client, then you start the kernel with We can also look at https://github.com/QuantStack/xeus and make a Python kernel using the Python C API using this, and it should be faster. Though we should make sure that the bottleneck is still actually jupyter_client and not IPython itself. |
So one of the sentences of my original question was somewhat imprecise in what I was saying, it was taking 8ms to get results back from the client, not from the kernel directly as I had mentioned, the issue is now updated to reflect that. Secondarily, while I get that performance is probably a death by a thousand cuts, question of whether the code is taking ~0 vs ~0.4 ms still is somewhat dwarfed by the 6.5 ms still unaccounted for. Presently, I'm in the process of building some more direct interfacing with the kernels to see what happens when I manage the zmq traffic myself. To this end, is there any good class that I can use that just manages the startup and shutdown of the kernels, but isn't so tightly integrated with the client functionality? Ideally I would like something that just started a kernel and returned to me the url and port to point the zmq sockets at. |
Update: Edit: More results |
More results: Here are the results in 50 run averages running against the echo kernel mentioned above:
There were some other values I recorded in order to get a more complete picture, but they were implementation specific and thus wouldn't be of much use to you guys. I guess this is mostly an IPython problem / IPython kernel wrapper problem? PS You might be wondering why this runtime is longer than the previously quoted value, it's just that I started waiting on the 'idle' status from the kernel so that I could collect the stdout and stderr streams. The echo kernel didn't get magically slower. |
cc @SylvainCorlay – using https://github.com/QuantStack/xeus he was apparently able to decrease some overhead of the Python implementation of the protocol in some case from 1s to negligible. |
@Carreau @takluyver
Please note that the Julia numbers are using a patch that I have in place, but the PR for which hasn't come through yet. So that brings me to my final question here. Would you guys be interested in an asyncio based client? I would have to do a little bit of asking, but I could probably make it happen. Of course, if it isn't really where this repo/project is going right now, that's fine too. Update: Methodology comments Thanks! |
I'd be interested in an asyncio based client. |
@JMurph2015 is websocket compression enabled in your configuration? |
@SylvainCorlay Sorry for being out of the loop on this project for so long, my fall was really busy! So I will try to whip up another PoC as most of my materials were with my summer internship, but I think I could get the bugs ironed out w.r.t. an asyncio client. |
And I'm not sure if it was, I'll have to get a new test setup and rerun the numbers. |
Hi!
This issue is being dropped here after discussion over on this thread:
jupyter-server/kernel_gateway#255
Intro
Basically the client interface to kernels is demonstrably adding a non-trivial amount of latency to every call made against a Jupyter notebook. Now one might ask "who cares about 8ms of latency?" and I would say anyone who would theoretically like to hit the Jupyter backend more than ~100 times a second. Reducing the latency to ~1ms should be achievable with the minimal overhead associated with ZMQ and the really minimal amount of post processing actually happening to any particular message.
What I've Seen So Far
This all means that I have ~7 ms (give or take 0.4ms for latency due to tcp and Websockets) that is unaccounted for. It would take a reasonably large amount of code to cause 7ms of delay (for example a simple prime sieve can find several thousand prime numbers in that sort of time).
Actual Question
If anyone has a good idea of what's taking 58x the time it takes me to dispatch calls to receive messages back, that would be awesome (it would be even more awesome if you could point me in the direction of changing that)
The text was updated successfully, but these errors were encountered: