-
Notifications
You must be signed in to change notification settings - Fork 93
Actor crash when reads and writes both block #132
Comments
I should add that my expected behavior for my repro script is for the writes to eventually block (since the server isn't consuming any of the data). This is coming up in production because of network latency or some other reason causing the |
It's a bug in Celluloid::IO. Here was the initial attempt at a fix: This was reverted because the way it's implemented, monitors accumulate in The correct implementation would be for Celluloid::IO objects to instead track the reactors they're attached to, and be able to look up the monitor they're currently using. That is, rather than reactors tracking their monitors (which is at best O(log(n)) complexity for n monitors), IO objects should track the reactors and their monitors. There's a missing API in nio4r that would help quite a bit as well: right now nio4r does not expose an API for changing interest ops. |
@tarcieri Do you think that commit is close enough to a solution that I should try to fix the leak issue? Or is there a reason why cleaning them up is not tractable? |
It needs to be completely redone in the manner I outlined in my last comment |
OK. I'd like to take a stab at it, then, even without the nio4r change. |
Go for it. |
Thanks for your time explaining this. Just to follow up, I don't think I will able to be successful making the change you describe. The surface area of code I'm unfamiliar with is I think too high. (I actually wonder if fixing this in nio4r and making it tolerant of 2 monitors for 1 io if one is read and the other is write is the cleanest way; I worried about losing the nice cleanup stuff of e.g. the I've figured out a workaround that unblocks me for my specific use of Krakow so I will have to be content with that. I won't try to make a PR to fix this in Celluloid-IO or nio4r. |
I might take the time soon to add the ability to change interest ops to nio4r. |
Can you explain briefly what needs to be done? |
In nio4r, or in Celluloid::IO? nio4r needs an API like this one from Java: http://docs.oracle.com/javase/7/docs/api/java/nio/channels/SelectionKey.html#interestOps(int) |
+1, I'm seeing this bug in my project as well. |
@tarcieri I saw that nio4r already has an API to change an interest on an IO object. Is this the missing link? |
@TiagoCardoso1983 yep |
awesome! should this/is this already handled transparently in that nio4r version? If not, what needs to be adapted? I saw that the code didn't change and registering already registered io objects fails. Or should the relationship IO/monitor/reactor change first? |
@TiagoCardoso1983 I think each IO object should keep a registry of which selectors it's been registered with, so it can look up the associated monitor and change the interests. Certainly not transparent, but interests are at least configurable now... |
Hi folks! I'm coming to Celluloid IO via the Krakow NSQ client gem. I have a reproduction of a bug that we've been seeing there, but I'm not sure if it's a bug in Celluloid IO or a bug in Krakow's assumptions about Celluloid IO.
The error we're seeing is “ArgumentError: this IO is already registered with selector” come from the
monitor = @selector.register(io, set)
line in reactor.rb’swait
method.The code opens a TCP socket and has a permanent read loop, run via
async
. It then receives writes (usually synchronously). The bug seems to occur when, during write, an:IO::WaitWritable
exception is thrown (see stream.rb'ssyswrite
method). This causes Celluloid IO towait_writeable
on the actor, leading to a call to reactor.rb'swait
method.At this point, the reactor's selector is already registered on the
io
due to therecv
in the read loop. When it registers again for the blocked write, the selector implementation raises an exception due to the double-registration of the same IO object.Here are two files that demonstrate the problem. It takes maybe 10 iterations on my machine to see the exception.
The error trace on my box is:
Is this considered a bug in Celluloid IO, or is this simultaneous read/write on the same TCPSocket that Krakow is doing not a good pattern?
Thanks for any help on this issue. I'm happy to give a fix a try if someone can sketch out what they think would be good; I'm wondering about having both a “read” and a “write” selector inside the reactor for the waits (since latches elsewhere in the code ensure that those are exclusive) but there are some details there that don’t quite fit.
The text was updated successfully, but these errors were encountered: