-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Distributed] error in running finalizer: ConcurrencyViolationError(msg="lock must be held") (any_gc_flag) #42126
Comments
So I assume this is more complicated than just dirty fixing this notify with an active waiting loop on the lock. |
Can you give #42240 a try? |
Assuming this issue is reproducible on 1.7 (@krynju?), perhaps we should mark this issue milestone-1.7, as a representative for the Distributed.jl threading issues that have been reported in various places, and have been touched on in the related series of now-reverted PRs, but have fallen by the wayside? |
Ref. related earlier tracking issue, which should perhaps also receive the milestone-1.7 label? JuliaLang/Distributed.jl#73 |
So is this a regression vs 1.6? If that is the case, it would be helpful if someone could bisect the offending commit. |
Not an regression, just a side-effect of people running things with multiple threads and Distributed.jl not being threadsafe. |
I don't think this should be on the milestone then which blocks 1.7 from being released until it is fixed. |
Though perhaps not a regression in some sense, folks are hitting this issue on 1.7 where they were not hitting it on 1.6. My best guess is that flipping the task migration bit effectively elevated the severity of this issue. |
Should not block 1.7, but backporting it to 1.7.1 would be good. |
This is a concurrency violation on this condition
julia/stdlib/Distributed/src/remotecall.jl
Line 259 in 8812c5c
Happens when running tests on krynju/Dagger.jl@a5b5267 (DTable groupby tests with Dagger on processes)
Windows, Julia master
Seems like the notify needs to be wrapped in a while->trylock loop (because it's running in finalizer context), since the lock on that condition doesn't seem to be obtained anywhere and the docs say that the Condition structure is not threadsafe
Will look more into that and open a PR sometime next week
The text was updated successfully, but these errors were encountered: