Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't block when queuing methods on *Server. #225

Merged
merged 2 commits into from
Apr 18, 2022

Conversation

zenhack
Copy link
Contributor

@zenhack zenhack commented Feb 20, 2022

This should fix #189, though right now the tests are deadlocking and
I don't have the energy to debug, so I'm putting it down.

This also obviates MaxConcurrentCalls; in the current version of the
patch it is still present, but unused. Once this is working, a
subsequent commit will remove the Policy struct.

This significantly restructures the internals of *Server, and I
think the result is easier to understand: the methods on *Server
that start calls just stick the calls in a queue, and there is a
goroutine that pulls them out and handles them. This even gets rid
of Server.mu entirely, since nothing actually needed the lock
anymore!

Hopefully it will stay simple when I actually get it working.

@zenhack zenhack marked this pull request as draft February 20, 2022 06:49
@zenhack
Copy link
Contributor Author

zenhack commented Feb 20, 2022

Got myself a stack trace, and several of the goroutines are blocked in Conn.lockSender() -- so I think I'm going to put this aside until #224 is merged, since it removes that function entirely and might fix things.

This should fix capnproto#189, though right now the tests are deadlocking and
I don't have the energy to debug, so I'm putting it down.

This also obviates MaxConcurrentCalls; in the current version of the
patch it is still present, but unused. Once this is working, a
subsequent commit will remove the Policy struct.

This significantly restructures the internals of *Server, and I
think the result is easier to understand: the methods on *Server
that start calls just stick the calls in a queue, and there is a
goroutine that pulls them out and handles them. This even gets rid
of Server.mu entirely, since nothing actually needed the lock
anymore!

Hopefully it will stay simple when I actually get it working.
@zenhack zenhack force-pushed the no-MaxConcurrentCalls branch from 94b64d7 to c3601e8 Compare April 16, 2022 23:08
@zenhack
Copy link
Contributor Author

zenhack commented Apr 16, 2022

Rebased, but unfortunately it's still deadlocking. The test that is deadlocking is server.TestServerShutdown. I don't really have time to debug it right now though, so will have to get back to it (maybe tomorrow).

See the comments documenting ordering requirements; previously, if the
method call was blocking on its own context, it would not be stopped in
response to server.Shutdown(), since the context for handleCalls is
entirely independent.
@zenhack zenhack marked this pull request as ready for review April 17, 2022 06:09
@zenhack
Copy link
Contributor Author

zenhack commented Apr 17, 2022

Alright, fixed it. The fix is fiddlier than I would like; see the comments about ordering in the last commit. Open to suggestions for simpler solutions.

I have let Policy live for now, mainly to keep the diff small; a followup pr will do the requisite slashing and burning.

@zenhack zenhack changed the title WIP: don't block when queuing methods on *Server. Don't block when queuing methods on *Server. Apr 17, 2022
Copy link
Collaborator

@lthibault lthibault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It took me a while to grok the exact behavior of handleCall(s) and Ack(), so I'm wondering if it might not be a good idea to document the high-level process somewhere? Perhaps as a docstring for handleCalls?

Below is my understanding of the process, which I want to sanity-check with you, and which might also serve as a starting point for such a docstring.

  1. The default behavior is to process incoming method calls synchronously.
  2. c.Ack() signals that the server start processing the next queued method call, and that c can continue concurrently in the background.
  3. Ack() spawns another call to handleCalls in a separate goroutine
  4. handleCalls blocks on srv.handleCall, until the call completes; if it determines that ack was called, it returns since another instance of handleCalls is running in the background (via step 3).

@zenhack
Copy link
Contributor Author

zenhack commented Apr 18, 2022 via email

@lthibault
Copy link
Collaborator

Quoting Louis Thibault (2022-04-18 12:52:14)
Below is my understanding of the process, which I want to sanity-check with you, and which might also serve as a starting point for such a docstring. 1. The default behavior is to process incoming method calls synchronously. 2. c.Ack() signals that the server start processing the next queued method call, and that c can continue concurrently in the background. 3. Ack() spawns another call to handleCalls in a separate goroutine 4. handleCalls blocks on srv.handleCall, until the call completes; if it determines that ack was called, it returns since another instance of handleCalls is running in the background (via step 3).
Correct. And yeah, it could use high-level docs; I'll do that soonish.

Ok perfect. Merging in the meantime, as this is just a detail anyway.

@lthibault lthibault merged commit 4f0a052 into capnproto:main Apr 18, 2022
@zenhack zenhack deleted the no-MaxConcurrentCalls branch April 18, 2022 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-deterministic deadlock on concurrent RPC calls
2 participants