-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel map function can hang during interruption or externally killed workers #212
Comments
Since this can happen in the middle of GPU work, it can be left in a state where the GPU doesn't get to free its memory. FWIW, this seems to be the best course of action: stopping X, calling |
Haven't seen that yet... I'm assuming it happened to you? |
Yes. |
Welp, more reason to fix this thing again... |
i'm seeing something similar here @danlamanna and @Purg when trying the SMQTK quickstart and docker. I have 50 images and it just hangs building the network...sometimes it gets to batch 2, sometimes stays in batch 1:
Any ideas? |
BTW I'm using SMQTK and Image Space qiuckstart dockers...the ones that ref one another. |
FWIW I was able to get this working but only by repetitively stopping and starting smqtk-services docker...over and over....and randomly it works all the way sometimes for my 6 batches of ~50 images, and 90% of the time it just hangs. |
When Ctrl-C'ing a parallel-map in progress, an dead-lock can occur.
It has also been seen that if the workers are doing web-requests, they can lock up, possibly due to an infinite wait issue with the request. Then the threads or processes are killed externally, the function dead-locks and can't clean itself up properly.
The text was updated successfully, but these errors were encountered: