-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby locking is per-fiber, not per-thread. #962
Comments
This is since Ruby 3.0 isn't it? Not massively looking to working through the significance of this change... Would you be willing to start by working with me to redefine what you think the specs should be now? |
IIRC, different versions of Ruby did different things and 3.0 standardised the behaviour. On some platforms where fibers were backed by threads, mutex was already per-fiber. I checked the implementation a little bit and it looks okay to me, you are using Maybe it's a genuine usage error on the part of rails, let me check their code. |
I've updated the original discussion of this issue, because I'm no longer sure that the issue is with this code. |
This seems to reproduce the original issue, at least the error is the same: require 'concurrent'
lock = Concurrent::ReentrantReadWriteLock.new
lock.acquire_read_lock
puts "Lockity lock!"
Thread.new do
lock.release_read_lock
end.join The question is, is this what's happening? |
That thread doesn't hold the lock so can't release it. |
Yes, I understand that, but why is it not happening on the same thread? |
Are you saying you're aware that the last code snippet you posted is broken - you're saying that something else in some other code is making the same mistake as this example code, and that's what you're trying to fix? Ok I think we're on the same page now. Yeah sounds like an application bug. You could probably add some error-handling code to say which thread is holding it and which is trying to release it (might take some extra book-keeping) - I'd try that. |
Sorry about my lack of clarity, I don't really know what's going on fully yet.
Yes, I'm trying to reproduce possible scenarios which could create the behaviour I'm seeing so I know what to look for.
Yes, maybe in Rails or maybe the way Rails is being invoked by Falcon.
Yes, that's the plan. I don't have a full repro, but I might try hacking something together. |
Okay, I reviewed all the code and was able to make what appears to be a repro: #!/usr/bin/env ruby
require 'concurrent'
require 'async'
lock = Concurrent::ReentrantReadWriteLock.new
Q = 0.01
watcher = Thread.new do
loop do
lock.with_write_lock do
sleep Q
end
sleep Q
end
end
Async do |task|
2.times do |i|
task.async do
loop do
lock.with_read_lock do
sleep Q
end
sleep Q
end
end
end
end Results in:
After that the lock is totally jammed up. |
Okay, I think I have an idea of the problem. The thread local variable is per-thread, while the synchronisation is per fiber. I think this must be leading to some kind of invalid state. |
Okay, that appears to be the problem. The fix in this case is to use fiber local variables: require 'thread'
require 'concurrent/atomic/abstract_thread_local_var'
module Concurrent
# @!visibility private
# @!macro internal_implementation_note
class RubyThreadLocalVar < AbstractThreadLocalVar
def initialize(...)
super
@key = :"concurrent-ruby-#{object_id}"
end
def value
Thread.current[@key] || default
end
def value=(value)
Thread.current[@key] = value
end
def allocate_storage
# No-op.
end
end
end I just monkey patched this into my local gem and it fixed the issue. The current implementation of thread locals looks a bit... over engineered? Maybe it was that way for backwards compatibility? In any case, the locking is per fiber, and the state used for tracking that locking is per-thread, so any time you mix the two, it's bound to eventually end in disaster. The solution is to use per-fiber state for the lock implementation. |
@chrisseaton any thoughts on the next steps? |
I'm a bit worried about how deep this goes. Yes likely complexity is about backwards-compatibility. I'd like to add a simple core set of primitives to MRI for the long-term but that's another discussion...
That's probably the central thing to think about. I'd like to be able to write a spec that we want to meet. Would you be able to help me with that? Even just the plain-English description of the behaviour you want. |
(It's not what I want, it's what's defined by CRuby). Mutex locking is per-fiber. Therefore, if you are storing state "per lock" you need to store it per fiber. That's what the above monkey patch does. |
@eregon can I please pull your expertise into this discussion. |
I think for ReentrantReadWriteLock and others which use ThreadLocalVar, we should use FiberLocalVar (to be added, #376 (comment)) if Mutex is per Fiber. m = Mutex.new
mutex_owned_per_thread = m.synchronize { Fiber.new { m.owned? }.resume } I think then the compatibility concerns are non-existent because using a ThreadLocalVar for per-Mutex-state on 3.0+ is just a bug. |
I don't have any objection to this fix if that's what you're thinking by the way. Just moving cautiously on a subject that is usually more complex than it appears. |
Thanks! |
Hey folks, we just came across an issue with v1.2.0 in JRuby that seems to be related to this. Our code uses ReentrantReadWriteLock; it worked fined under v1.1.10, but in v1.2.0 we're now seeing this error:
At first glance it looks like this could be mitigated by switching from
However, I have no idea how much deeper this goes, if this would need to be swapped out in multiple places, etc. |
The above PR should fix this issue. |
Somehow totally missed that, thank you! |
The missing require only affects Ruby <= 3.0, 3.1+ always have |
Ruby mutex implementation is per-fiber and generally speaking, all locking mechanisms should be per-fiber for the purpose of mutual exclusion.
As such, it appears at least some parts of concurrent ruby might not be correct, according to this specification.
I've been looking at
concurrent-ruby/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb
Line 51 in 3851561
Specifically
It appears that multi-fiber re-entrancy is not supported or handled correctly, or used incorrectly.
I'm not sure what the correct solution is here, but at least we can start having a discussion about how it should work and what the solution is.
The text was updated successfully, but these errors were encountered: