-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run cells in different threads #1155
Comments
Someone could write a kernel, or an IPython extension, to do things like that. However, threads and shared memory bring up a whole host of issues (not just the GIL), and I don't think we have any plans to do anything like that in the project. |
Ok, I understand this may not fit in development plans for Jupyter. I would still think it could be useful, maybe as part of a separate kernel or iPython extension. Actually, my typical use case for Jupyter + iPython kernel is to run scientific computation. I am then looking forward having some features made easily accessible:
I am not sure if this is the typical use case for Jupyter, but some of the features are already implemented, with various stability, either in Jupyter or in extensions. I am wondering how much people would be interested in such features, especially as the one described in this issue. Concerning this issue particularly, I might have a look at kernels or writing iPython extension, if it can be of interest. I am particularly aware that threads and sharing memory is an open door for many additionnal issues, but some basic implementation may be doable, in my opinion. Especially if we either restrict this feature to very specific cases, in which we are sure there are no side effects, or let the user explicitly turn on the feature (in which case, he is responsible for it). Moreover, maybe it would be easier to do with some kernels rather than others (thinking in Julia for instance). |
There's an IPython project ipyparallel to control multiple engines, but distributing computation without the user having to think about it is a hard problem. If you're interested in that area, have a look at dask. There's a module called dill which can save your variables and things - it's an extension of Python's standard pickle module. It still can't handle everything, but it can do quite a lot. Another approach you can look into is checkpoint-restart, which saves an entire process to a file. Here's a presentation from a couple of years ago about doing this in Python: http://conference.scipy.org/proceedings/scipy2013/pdfs/arya.pdf |
Have a look also at https://github.com/dask/distributed you can get some nice introduction on matt's blog. You might also want to look at https://github.com/cloudpipe/cloudpickle beyond dill which can serialize more some object dill cannot. The parallel computation is definitively not a Jupyter feature but a Python feature, and the way Python works it will be relatively hard to make it work magically. The advantage of using things like Dask/Distributed/.. is also that it will work on non-jupyter environment, which is nice. If you want to dive into the IPython kernel, we'll be happy to guide you and get feedback from API/Docs... |
Thanks for all the links and pointers to doc and modules! I already knew about The ability to run in a non-jupyter environment is indeed really nice. My idea, and the reason I posted on Jupyter/notebook is that I think it would be really awesome to have something well integrated and packaged. One of the major feature of Jupyter notebook and iPython kernel is that it "just works", and gives a really user-friendly setup for advanced tasks, out of the box :) I think I will try to see what I can get from assembling all of this, and if it could be worth integrating further in Jupyter, via extensions or custom kernels. |
While we want Jupyter & IPython to be usable and useful straight out of the box, they're never going to do everything you could want. There's a big ecosystem of different tools out there, and we don't want to try to subsume that all into Jupyter. |
Sure, but hopefully it could be made as easy as the current matplotlib integration: EDIT: Maybe this discussion should move to the mailing-list or similar, as it is now stated that this is a "not notebook" issue? |
Technically that kind of integration is easy enough to do - it's working out what interface makes sense that's hard. Venue: up to you, there's no particular problem with discussing it here. I set that milestone just because I don't think there's a specific notebook related issue to be fixed. |
@Phyks : We're doing a little housekeeping on our issue log and noticed this thread from 2016. Were you able to find a solution to your issue? Please let us know so we can close this one. thanks! |
Hi @JamiesHQ, Sorry I have been busy lately and did not advance much on this issue. I will post any working solution I have for sure, if I get some. |
FYI, I've been able to do some basic multithreading in Jupyter notebooks by subclassing multiprocessing.Process with ipywidgets for feedback. It works pretty well! In the future, I might use button widgets for spawning and stopping processes. I'm actually using this to run a flask server that serves a REST api for some data that's processed in parallel. I wanted to be able to use my Jupyter notebook to serve analyses in a way that could be used outside of Python. The flask process and the data analyzer are each running in their own Process subclasses and are sharing data via Manager objects. Using this system, I can start and stop new analyzers for the flask process to serve, all from the same notebook. It's pretty nice! |
@micahscopes Have you posted an example of your basic multithreading notebook recipe anywhere? |
Here is an example with multiprocessing.Process . I will try and make an
example with multiple threads when I get a chance, but this gives you an
idea! https://gist.github.com/micahscopes/2f523a8f485d3fe53cc32cef450ca27f
…On Nov 15, 2017 3:01 AM, "Tony Hirst" ***@***.***> wrote:
@micahscopes <https://github.com/micahscopes> Have you posted an example
of your basic multithreading notebook recipe anywhere?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1155 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAXylrB1IwZH6-XPyl3kDI1PbPFHkYxTks5s2qh4gaJpZM4HmW9R>
.
|
@psychemedia Try it out: https://github.com/micahscopes/nbmultitask/blob/master/examples.ipynb |
Thx @micahscopes ! I'll try It! |
It's very good! I'm doing a wrapper for spark on it: https://github.com/databootcampbr/nbthread-spark |
Is there any more progress on this? I am interested as well. |
Hi,
Sometimes I run long code in some cell, and still want to be able to run small snippets (independant) in another cell. But I cannot because the cells are ran sequentially on the same kernel.
Maybe the execution of each cell could be threaded, and then this would be possible. I know there could be issues with the GIL, but in most cases it could work I think. Typical use case would be to perform multiple (independant) long computations in parallel, in different cells, without having to deal with
subprocess
and so on, which would be super user-friendly.The ideal use case, but which would be a lot more difficult to implement, is to be able to perform a long-running computation in a cell, and start to study the results. For instance, if a cell fills a list with data, it could be useful to start plotting and reading the list elements in another cell, while the computation is still running.
I did not find many references to this, except a SO thread.
Thanks
The text was updated successfully, but these errors were encountered: