-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger spilling on MemoryError #3612
Comments
@mrocklin, do you have any thoughts on this idea? Does it seem reasonable? If so, where in the code should I be looking to implement something like this? |
Maybe? I don't know. Have you seen cases where Python functions raise MemoryError when they're low on memory? In my anecdotal experience they tend to just ask the OS for more memory, and the OS spills out to virtual memory / disk. |
I would expect whatever change to happen somewhere in worker.py, but you would have to dig around for a while to find the right place I suspect. |
Yes this is something we are seeing regularly. I guess I'm looking for where the worker runs the task and how to trigger spilling on the worker. |
@jakirkham did you come up with anything here? I think that next steps here are to ...
|
I think the motivation is already here based on user feedback ( #2602 ) ( #2297 ) and should solve those issues (or at least let them proceed to the next blocker ;) The intervention is proposed. What's less clear is where this goes. If this is something you are able to advise on, that would be helpful. 🙂 |
cc @madsbk (for awareness) |
I second the request for real life experience with MemoryError. My understanding is that you will get MemoryError exclusively if
in all other cases, you should either get in paused state or you'll pass the terminate threshold and get killed by the nanny. |
Honestly we've been struggling with this issue on-and-off for a few years now. There are some general examples with Dask already linked in my comment above. Currently we are working on adding this functionality to RAPIDS. It would be great to discuss upstreaming a more general solution, but if that's not of interest we can continue with a RAPIDS specific solution until that changes. |
It would be useful to catch any
MemoryError
s thrown by workers and use these as an opportunity to spill and then retry whatever task failed.The text was updated successfully, but these errors were encountered: