-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby concurrent map decoding sporadically has empty results #6532
Comments
I've had the most luck reproducing this on Ruby 2.3, so that's what I've been using. This doesn't appear to be GC-related. I can reproduce the error even if the test is wrapped in I can get this modified version of the test to fail about 1 out of 15 times on the Ruby 2.3 docker image. def test_concurrent_decoding
GC.start
GC.disable
o = Outer.new
o.items[0] = Inner.new
raw = Outer.encode(o)
thds = 2.times.map do |t|
Thread.new do
assert_equal o, Outer.decode(raw), "thread #{t}, raw=#{raw.length}"
end
end
thds.map(&:join)
GC.enable
end Run in a loop with: while true; do bundle exec ruby tests/basic.rb -n test_concurrent_decoding ; done But oddly, adding more looping in Ruby itself doesn't help the bug reproduce more often. This suggests that there is some process state in play here, and we only get a chance to trigger the bug in a fresh Ruby process. Perhaps it has to do with starting threads. Maybe |
I've narrowed this down to a race condition in the lazy initialization of decoder state in Specifically, if I call It seems that some calls into the Ruby API from C must be capable of releasing the GVL, causing us to re-enter the protobuf C extension from a different thread. I am curious to know more about what operations can do this. Can even a hash table lookup do this (if so, this call is suspect: protobuf/ruby/ext/google/protobuf_c/defs.c Line 2220 in 63f324a
@samphippen can you comment? What calls into Ruby can trigger a thread switch and GVL release? Is it any call into Ruby, or just some calls? |
To answer my own question, it sounds like anything that triggers a GC, or possibly evaluates some Ruby, has the potential of switching Ruby threads and releasing the GVL. This is the same issue explained by @tenderlove in #3560 (comment) I have a fix in #6541 . Locally it seems to reliably fix the problem. |
The Ruby unit tests occasionally fail with:
Here is a sample failure in Kokoro.
This was first observed on the PR #5866, so is presumed to be related. However the bug is very sporadic and difficult to reproduce, so it's hard to say for sure when it was introduced.
Previously I tried the following strategies to make the bug more reproducible:
The text was updated successfully, but these errors were encountered: