Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add connectionbroker package #1850

Merged
merged 1 commit into from
Jan 9, 2017

Conversation

aaronlehmann
Copy link
Collaborator

This package is a small abstraction on top of Remotes that provides a
gRPC connection to a manager. If running on a manager, it uses the unix
socket, otherwise it will pick a remote manager using Remotes. This
will allow things like agent and certificate renewal to work even on a
single node with no TCP port bound.

This is a piece of #1826. The next PR will convert the CA and agent to
use connectionbroker.

Copy link
Collaborator

@dperny dperny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Would writing unit tests for this package be worthwhile?
  • Are you wanting to merge this PR before the work to integrate it is done, or is the PR just opened to get review? I see the original PR now.


// Remotes returns the remotes interface used by the broker, so the caller
// can make observations or see weights directly.
func (b *Broker) Remotes() remotes.Remotes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this method? It seems to me that this breaks separation of concerns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent needs access to Remotes to update the set of managers based on the list it gets from the dispatcher.

https://github.com/docker/swarmkit/pull/1826/files#diff-6ea70e0a4de93ebcb83000a028edc163L344

The alternative to exposing Remotes through this method is to implement Observe, Weights, and Remove as connectionbroker methods, and that feels like it would pollute the connectionbroker method set, instead of keeping that functionality specific to Remotes.

Or we could pass the underlying Remotes into the agent separately, but I don't like that much either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

@dperny
Copy link
Collaborator

dperny commented Jan 9, 2017

Looks good to me.

// records a positive experience with the remote peer if success is true,
// otherwise it records a negative experience. If a local connection is in use,
// Close is a noop.
func (c *Conn) Close(success bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little counter-intuitive to close an "unsuccessful" connection. Would it be better to move if success logic to SelectRemote.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close won't be called when SelectRemote returns an error. The success parameter is determined later on. For example, if something tries an RPC and that RPC fails, success will be set to false when closing the connection so that a negative observation is made and future attempts will prefer a different remote.

I think Dial generally doesn't fail immediately (since Dial is an async call), but it's probably a good idea to automatically perform a negative observation if Dial returns an error. I'll add that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an observation if Dial fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can success be internal to Conn, like derived from operation failure on Conn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not, because Conn doesn't know whether RPCs that use it fail or not.

We could split it out into a separate Observe call on Conn. But I think it's nice to have it built into Close. That way, the use of a connection has to be either successful or unsuccessful. You can't forget to call Observe (we've had bugs like this before).

If it's weird for Close to take a parameter, we could rename this to something else like CloseAndObserve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Conn doesn't know whether RPCs that use it fail or not.

Could this be moved to where failure is observed? If Conn cannot observe failures, maybe weight handling shouldn't be done here. How about doing it at RPC calls? When RPC fails, decrease the node's weight. On RPC success, adjust weight according to previous state (no need to increase weight on continuous success operations).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved to where failure is observed?

We can do that, but as I mentioned above, I think it's better this way. We had a several bugs before where certain code paths returned early and skipped adjusting the weight. If weight adjustment is part of closing the connection, it can't be skipped accidentally without leaking a connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. We have similar problem in class Swarm where operation failure should adjust node preference. It's hard to do that on all the calls.

This package is a small abstraction on top of Remotes that provides a
gRPC connection to a manager. If running on a manager, it uses the unix
socket, otherwise it will pick a remote manager using Remotes. This
will allow things like agent and certificate renewal to work even on a
single node with no TCP port bound.

Signed-off-by: Aaron Lehmann <[email protected]>
@dongluochen
Copy link
Contributor

LGTM

@aaronlehmann aaronlehmann merged commit 387bc93 into moby:master Jan 9, 2017
@aaronlehmann aaronlehmann deleted the connectionbroker branch January 9, 2017 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants