-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BoundsError() in flush_gc_msgs #6297
Comments
If your program is doing an The BoundsError() could be the result of the remotecall_fetch, no explanation for the printed stack showing line 140 though. |
Unfortunately that's not a possible explanation, I'm not changing the number of processes dynamically. |
If you still have the session open could you |
The session is open but I don't have access to those variables since they were local to a function (I removed that from the backtrace). |
OK. Anyways, just to be safe, your code should probably be changed to
No? |
Yes, that is basically what it actually does. I simplified too much wrt the actual code, sorry. I also didn't specify that that portion of the code runs thousands of times without issues before crashing. |
So this bug is really killing me now, it happens randomly but given enough time it will show up reliably, and I'm running some long simulations, which means they never reach the end, crashing instead. I have some more data though (12 Mb of data, to be precise), but I'm not sure it's useful. If given directions, I could produce something more detailed (give a few days of the simulation running). For the time being, I put a
The result is collected in this compressed file. More information: the job was running with 1 master and 25 worker processes (all local). I've seen it happening on 2 different machines. The stack trace was similar to the one reported before, however I've seen different traces in other cases, all of them ending with the same 2 calls,
versioninfo:
I'm now restarting producing a different output file for each worker, and removing |
Still an issue with the new GC? |
Please reopen if this is still an issue. |
I have seen occasionally this error in long running jobs:
Line 140 of
multi.jl
is:which seems a strange place to throw a
BoundsError
.To give context, my code uses SharedArrays and the error comes from within a
@sync
'd for block with@spwanat
, something like:where
ps
is a list of processes,out
andshrd
are SharedArrays (shared among allps
).I wouln't know how to reproduce though. Reporting for the record and just in case someone can guess what's going on. But I still have a Julia session where it happened open if that can be of any use.
Version which was running (with 5 workers):
The text was updated successfully, but these errors were encountered: