Ruby locking is per-fiber, not per-thread. #962

ioquatix · 2022-10-17T03:03:31Z

Ruby mutex implementation is per-fiber and generally speaking, all locking mechanisms should be per-fiber for the purpose of mutual exclusion.

As such, it appears at least some parts of concurrent ruby might not be correct, according to this specification.

I've been looking at

concurrent-ruby/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb

Line 51 in 3851561

class ReentrantReadWriteLock < Synchronization::Object

because we had some reports here socketry/falcon#166

Specifically

      "kind":"Concurrent::IllegalOperationError",
      "message":"Cannot release a read lock which is not held",

It appears that multi-fiber re-entrancy is not supported or handled correctly, or used incorrectly.

I'm not sure what the correct solution is here, but at least we can start having a discussion about how it should work and what the solution is.

The text was updated successfully, but these errors were encountered:

chrisseaton · 2022-10-17T21:00:39Z

This is since Ruby 3.0 isn't it?

Not massively looking to working through the significance of this change...

Would you be willing to start by working with me to redefine what you think the specs should be now?

ioquatix · 2022-10-17T22:43:18Z

IIRC, different versions of Ruby did different things and 3.0 standardised the behaviour. On some platforms where fibers were backed by threads, mutex was already per-fiber.

I checked the implementation a little bit and it looks okay to me, you are using Mutex correctly.

Maybe it's a genuine usage error on the part of rails, let me check their code.

ioquatix · 2022-10-17T23:00:19Z

I've updated the original discussion of this issue, because I'm no longer sure that the issue is with this code.

ioquatix · 2022-10-17T23:12:13Z

This seems to reproduce the original issue, at least the error is the same:

require 'concurrent'

lock = Concurrent::ReentrantReadWriteLock.new
lock.acquire_read_lock

puts "Lockity lock!"

Thread.new do
  lock.release_read_lock
end.join

The question is, is this what's happening?

chrisseaton · 2022-10-17T23:14:39Z

Cannot release a read lock which is not held

That thread doesn't hold the lock so can't release it.

ioquatix · 2022-10-17T23:34:28Z

Yes, I understand that, but why is it not happening on the same thread?

chrisseaton · 2022-10-17T23:39:53Z

Are you saying you're aware that the last code snippet you posted is broken - you're saying that something else in some other code is making the same mistake as this example code, and that's what you're trying to fix?

Ok I think we're on the same page now. Yeah sounds like an application bug.

You could probably add some error-handling code to say which thread is holding it and which is trying to release it (might take some extra book-keeping) - I'd try that.

ioquatix · 2022-10-18T00:04:05Z

Sorry about my lack of clarity, I don't really know what's going on fully yet.

Are you saying you're aware that the last code snippet you posted is broken - you're saying that something else in some other code is making the same mistake as this example code, and that's what you're trying to fix?

Yes, I'm trying to reproduce possible scenarios which could create the behaviour I'm seeing so I know what to look for.

Ok I think we're on the same page now. Yeah sounds like an application bug.

Yes, maybe in Rails or maybe the way Rails is being invoked by Falcon.

You could probably add some error-handling code to say which thread is holding it and which is trying to release it (might take some extra book-keeping) - I'd try that.

Yes, that's the plan. I don't have a full repro, but I might try hacking something together.

ioquatix · 2022-10-18T00:45:35Z

Okay, I reviewed all the code and was able to make what appears to be a repro:

#!/usr/bin/env ruby

require 'concurrent'
require 'async'

lock = Concurrent::ReentrantReadWriteLock.new
Q = 0.01

watcher = Thread.new do
  loop do
    lock.with_write_lock do
      sleep Q
    end
    sleep Q
  end
end

Async do |task|
  2.times do |i|
    task.async do
      loop do
        lock.with_read_lock do
          sleep Q
        end
        sleep Q
      end
    end
  end
end

Results in:

0>1>0-1-  0.0s     warn: Async::Task [oid=0xa0] [ec=0xb4] [pid=857111] [2022-10-18 13:43:20 +1300]
               | Task may have ended with unhandled exception.
               |   Concurrent::IllegalOperationError: Cannot release a read lock which is not held
               |   → /home/samuel/.gem/ruby/3.1.2/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:244 in `release_read_lock'
               |     /home/samuel/.gem/ruby/3.1.2/gems/concurrent-ruby-1.1.10/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:130 in `with_read_lock'
               |     bug.rb:23 in `block (4 levels) in <main>'
               |     bug.rb:21 in `loop'
               |     bug.rb:21 in `block (3 levels) in <main>'
               |     /home/samuel/.gem/ruby/3.1.2/gems/async-2.2.1/lib/async/task.rb:107 in `block in run'
               |     /home/samuel/.gem/ruby/3.1.2/gems/async-2.2.1/lib/async/task.rb:243 in `block in schedule'

After that the lock is totally jammed up.

ioquatix · 2022-10-18T00:47:15Z

Okay, I think I have an idea of the problem.

The thread local variable is per-thread, while the synchronisation is per fiber. I think this must be leading to some kind of invalid state.

ioquatix · 2022-10-18T00:58:34Z

Okay, that appears to be the problem. The fix in this case is to use fiber local variables:

require 'thread'
require 'concurrent/atomic/abstract_thread_local_var'

module Concurrent

  # @!visibility private
  # @!macro internal_implementation_note
  class RubyThreadLocalVar < AbstractThreadLocalVar
    def initialize(...)
      super
      @key = :"concurrent-ruby-#{object_id}"
    end

    def value
      Thread.current[@key] || default
    end

    def value=(value)
      Thread.current[@key] = value
    end

    def allocate_storage
      # No-op.
    end
  end
end

I just monkey patched this into my local gem and it fixed the issue.

The current implementation of thread locals looks a bit... over engineered? Maybe it was that way for backwards compatibility?

In any case, the locking is per fiber, and the state used for tracking that locking is per-thread, so any time you mix the two, it's bound to eventually end in disaster. The solution is to use per-fiber state for the lock implementation.

ioquatix · 2022-11-10T00:08:16Z

@chrisseaton any thoughts on the next steps?

chrisseaton · 2022-11-10T00:13:41Z

I'm a bit worried about how deep this goes.

Yes likely complexity is about backwards-compatibility. I'd like to add a simple core set of primitives to MRI for the long-term but that's another discussion...

the locking is per fiber, and the state used for tracking that locking is per-thread

That's probably the central thing to think about. I'd like to be able to write a spec that we want to meet. Would you be able to help me with that? Even just the plain-English description of the behaviour you want.

ioquatix · 2022-11-10T00:34:42Z

Even just the plain-English description of the behaviour you want.

(It's not what I want, it's what's defined by CRuby).

Mutex locking is per-fiber.

Therefore, if you are storing state "per lock" you need to store it per fiber.

That's what the above monkey patch does.

ioquatix · 2022-11-10T01:19:17Z

@eregon can I please pull your expertise into this discussion.

eregon · 2022-11-10T14:35:45Z

I think for ReentrantReadWriteLock and others which use ThreadLocalVar, we should use FiberLocalVar (to be added, #376 (comment)) if Mutex is per Fiber.
That can be easily detected with

m = Mutex.new
mutex_owned_per_thread = m.synchronize { Fiber.new { m.owned? }.resume }

I think then the compatibility concerns are non-existent because using a ThreadLocalVar for per-Mutex-state on 3.0+ is just a bug.

chrisseaton · 2022-11-10T16:40:36Z

I don't have any objection to this fix if that's what you're thinking by the way. Just moving cautiously on a subject that is usually more complex than it appears.

eregon · 2023-01-23T18:32:02Z

https://github.com/ruby-concurrency/concurrent-ruby/releases/tag/v1.2.0

ioquatix · 2023-01-23T20:39:18Z

Thanks!

hoprocker · 2023-02-20T01:50:17Z

Hey folks, we just came across an issue with v1.2.0 in JRuby that seems to be related to this. Our code uses ReentrantReadWriteLock; it worked fined under v1.1.10, but in v1.2.0 we're now seeing this error:

undefined method `current' for Fiber:Class
Did you mean?  __current__

rubygems/concurrent-ruby-1.2.0/lib/concurrent-ruby/concurrent/atomic/locals.rb:180:in `locals!'"
rubygems/concurrent-ruby-1.2.0/lib/concurrent-ruby/concurrent/atomic/locals.rb:102:in `set',
rubygems/concurrent-ruby-1.2.0/lib/concurrent-ruby/concurrent/atomic/fiber_local_var.rb:77:in `value=',
rubygems/concurrent-ruby-1.2.0/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:205:in `acquire_read_lock',
rubygems/concurrent-ruby-1.2.0/lib/concurrent-ruby/concurrent/atomic/reentrant_read_write_lock.rb:128:in `with_read_lock',
...

link

At first glance it looks like this could be mitigated by switching from Fiber.current to Thread.current with some sort of environment evaluation, ie

current_ctx = (RUBY_PLATFORM == 'java' ? Thread.current : Fiber.current)
ObjectSpace.define_finalizer(current_ctx, thread_fiber_finalizer(locals.object_id))

However, I have no idea how much deeper this goes, if this would need to be swapped out in multiple places, etc.

Fixes <#962>.

ioquatix · 2023-02-20T02:40:23Z

The above PR should fix this issue.

hoprocker · 2023-02-20T06:38:42Z

Somehow totally missed that, thank you!

Fixes <#962>.

eregon · 2023-02-24T17:40:29Z

The missing require only affects Ruby <= 3.0, 3.1+ always have Fiber.current defined.
And the reason we missed this is rake spec:isolated is only run 3.2 currently in CI, I'll also run it on the oldest Ruby we support to help catch this.

eregon · 2023-02-24T17:59:07Z

Released in https://github.com/ruby-concurrency/concurrent-ruby/releases/tag/v1.2.1

ioquatix mentioned this issue Oct 17, 2022

Concurrent::ReentrantReadWriteLock might not be fiber safe. rails/rails#46257

Closed

ioquatix mentioned this issue Oct 17, 2022

Unexpectedly and often getting Concurrent::IllegalOperationError socketry/falcon#166

Closed

eregon self-assigned this Dec 12, 2022

eregon assigned ioquatix Jan 10, 2023

ioquatix mentioned this issue Jan 11, 2023

Fix ReentrantReadWriteLock implementation when Mutex is per-fiber. #983

Merged

eregon closed this as completed in #983 Jan 23, 2023

ioquatix added a commit that referenced this issue Feb 20, 2023

Add require 'fiber' for Fiber.current usage.

9acd356

Fixes <#962>.

ioquatix mentioned this issue Feb 20, 2023

Add require 'fiber' for Fiber.current usage. #990

Merged

eregon pushed a commit that referenced this issue Feb 22, 2023

Add require 'fiber' for Fiber.current usage.

0c6f5a8

Fixes <#962>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ruby locking is per-fiber, not per-thread. #962

Ruby locking is per-fiber, not per-thread. #962

ioquatix commented Oct 17, 2022 •

edited

Loading

chrisseaton commented Oct 17, 2022

ioquatix commented Oct 17, 2022

ioquatix commented Oct 17, 2022

ioquatix commented Oct 17, 2022

chrisseaton commented Oct 17, 2022

ioquatix commented Oct 17, 2022

chrisseaton commented Oct 17, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Nov 10, 2022

chrisseaton commented Nov 10, 2022

ioquatix commented Nov 10, 2022 •

edited

Loading

ioquatix commented Nov 10, 2022

eregon commented Nov 10, 2022 •

edited

Loading

chrisseaton commented Nov 10, 2022

eregon commented Jan 23, 2023

ioquatix commented Jan 23, 2023

hoprocker commented Feb 20, 2023

ioquatix commented Feb 20, 2023

hoprocker commented Feb 20, 2023

eregon commented Feb 24, 2023

eregon commented Feb 24, 2023

Ruby locking is per-fiber, not per-thread. #962

Ruby locking is per-fiber, not per-thread. #962

Comments

ioquatix commented Oct 17, 2022 • edited Loading

chrisseaton commented Oct 17, 2022

ioquatix commented Oct 17, 2022

ioquatix commented Oct 17, 2022

ioquatix commented Oct 17, 2022

chrisseaton commented Oct 17, 2022

ioquatix commented Oct 17, 2022

chrisseaton commented Oct 17, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Oct 18, 2022

ioquatix commented Nov 10, 2022

chrisseaton commented Nov 10, 2022

ioquatix commented Nov 10, 2022 • edited Loading

ioquatix commented Nov 10, 2022

eregon commented Nov 10, 2022 • edited Loading

chrisseaton commented Nov 10, 2022

eregon commented Jan 23, 2023

ioquatix commented Jan 23, 2023

hoprocker commented Feb 20, 2023

ioquatix commented Feb 20, 2023

hoprocker commented Feb 20, 2023

eregon commented Feb 24, 2023

eregon commented Feb 24, 2023

ioquatix commented Oct 17, 2022 •

edited

Loading

ioquatix commented Nov 10, 2022 •

edited

Loading

eregon commented Nov 10, 2022 •

edited

Loading