Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow swarm join with --availability=drain #24993

Merged
merged 2 commits into from
Jan 11, 2017

Conversation

yongtang
Copy link
Member

@yongtang yongtang commented Jul 25, 2016

- What I did

This fix tries to address the issue raised in #24596 where it was not possible to join as manager only (--availability=drain).

The related PR in swarmkit is:
moby/swarmkit#1271

- How I did it

This fix adds a new flag --availability to swarm join.

- How to verify it

An integration test has been added.

- Description for the changelog

Add a new flag --availability to swarm join.

- A picture of a cute animal (not mandatory but encouraged)

This fix fixes #24596.

The related PR in swarmkit is:

moby/swarmkit#1271

@vdemeester
Copy link
Member

@thaJeztah
Copy link
Member

I'm +1 on design, I think it makes sense to haven an option to set the initial availability of a node

@thaJeztah
Copy link
Member

Tentatively adding this to the milestone (because the related issue is), but it's no show-stopper

@thaJeztah thaJeztah added this to the 1.12.0 milestone Jul 25, 2016
@@ -43,6 +43,9 @@ message IssueNodeCertificateRequest {
// Token represents a user-provided string that is necessary for new
// nodes to join the cluster
string token = 3;

// Availability allows a user to control the current scheduling status of a node.
string availability = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be ignored for a renewal. I think that's okay, but we need to make it clear in the comment.

I'm not sure we want to add availability to the top level of IssueNodeCertificateRequest. Either there should be a nested messages with parameters for the new node object, or we should pass a NodeSpec and only honor certain fields from it (because not everything in a NodeSpec is under the node's control).

@aluzzardi @stevvooe any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like much availability leaking to the NodeCA.

Also - if we go down this path, we should probably take an initial NodeSpec rather than a bunch of fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really isn't the right place for it.

Agree with the concept of providing a NodeSpec at join time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongtang: It sounds like the consensus is that we should provide a NodeSpec instead of just an availability setting. But as I mentioned earlier, the node CA should only use certain fields from this spec when it creates the node. There are things like node labels that are controlled from elsewhere.

@stevvooe
Copy link
Contributor

In general, this is a good feature, but the implementation needs some design work.

Do we have something open in swarmkit?

@yongtang
Copy link
Member Author

@stevvooe @aaronlehmann @aluzzardi @vdemeester @thaJeztah Thanks a lot for the review and help!

I was looking for feedbacks on this PR before, so I haven't created the pull request on swarmkit side yet.

Let me rework on this pull request based on the comments. Will work on NodeSpec and update the pull request soon.

@icecrime
Copy link
Contributor

Thanks @yongtang! In the meantime, I'll remove the milestone as it won't make it in time for 1.12.0 (which is ok).

@icecrime icecrime removed this from the 1.12.0 milestone Jul 27, 2016
@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch 2 times, most recently from a48810c to 303fb5d Compare July 29, 2016 02:02
@cpuguy83
Copy link
Member

ping @stevvooe

@stevvooe
Copy link
Contributor

@cpuguy83 I am not necessarily on board with using NodeSpec here, since we don't honer the other aspects of the structure, such as node role.

@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch from 303fb5d to efc32f8 Compare August 16, 2016 18:47
@@ -235,7 +235,7 @@ func (c *Cluster) reconnectOnFailure(n *node) {
}
}

func (c *Cluster) startNewNode(forceNewCluster bool, localAddr, remoteAddr, listenAddr, advertiseAddr, joinAddr, joinToken string) (*node, error) {
func (c *Cluster) startNewNode(forceNewCluster bool, localAddr, remoteAddr, listenAddr, advertiseAddr, joinAddr, joinToken string, availability types.NodeAvailability) (*node, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too many string arguments in a row.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stevvooe. The pull request has been updated with localAddr, listenAddr, and advertiseAddr into one consolidated nodeAddr. Please let me know if there are any issues.

@cpuguy83
Copy link
Member

ping @yongtang

@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch 3 times, most recently from 82cac7d to 8f71074 Compare August 27, 2016 21:02
@yongtang
Copy link
Member Author

Thanks @cpuguy83. The pull request has been rebased and updated. Please let me know if there are any issues.

@thaJeztah
Copy link
Member

Sorry @yongtang this needs another rebase

ping @stevvooe is this ok to be moved to code review?

@stevvooe
Copy link
Contributor

This was my last comment here:

@cpuguy83 I am not necessarily on board with using NodeSpec here, since we don't honer the other aspects of the structure, such as node role.

If we can reconcile this, then we can probably move along.

@aaronlehmann
Copy link
Contributor

To unblock this, maybe we can go back to the approach of not using NodeSpec.

Using NodeSpec was my suggestion. I'm sorry it delayed this so much.

@stevvooe
Copy link
Contributor

@aaronlehmann I have no problem using NodeSpec, as long as we don't cherry pick parameters. I think the problem is that NodeSpec can set things like the name, which should be under control of the cluster administrator, or membership, which is the purview of the managers. NodeSpec is supposed to be the user input for a node.

Let's just make this a field value for "request availability", ultimately leaving the policy to the orchestrator as to whether it is honored or not.

Do we also incorporate this into the join token?

@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch from 5f626fa to 009f3e3 Compare November 23, 2016 19:05
@yongtang
Copy link
Member Author

Thanks all for the review. The PR has been updated with vendoring of swarmkit done. Now all tests passes. Please take a look and let me know if there are additional issues.

@aaronlehmann
Copy link
Contributor

LGTM after rebase

@yongtang
Copy link
Member Author

@aaronlehmann Rebased. Thanks.

@cpuguy83
Copy link
Member

What's the status here?

@aaronlehmann
Copy link
Contributor

The swarmkit-side PR was merged, so this is most of the way there. Sadly, it needs yet another rebase. It still looks good to me. Needs a second LGTM.

@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch from 1c13b15 to 663e1d1 Compare December 22, 2016 02:14
@yongtang
Copy link
Member Author

@cpuguy83 @aaronlehmann The PR has been rebased.

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aaronlehmann
Copy link
Contributor

@thaJeztah: What are the next steps to get this merged? It's not an urgent PR, but I feel bad about it being open so long, and asking for rebases every few weeks.

@thaJeztah
Copy link
Member

oh, my bad, lost track of this one 😞

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some suggestions

@@ -133,6 +134,16 @@ Snapshots compact the Raft log and allow for more efficient transfer of the
state to new managers. However, there is a performance cost to taking snapshots
frequently.

### `--availability string`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove string here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

### `--availability string`

This flag specifies the availability of the node at the time the node joins a master.
Possiblle availability value includes `active`, `pause`, or `drain`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo Possiblle -> Possible

Possible availability values are `active`, `pause`, or `drain`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This flag is useful in certain situations. For example, a cluster may want to have
dedicated manager nodes that are not served as worker nodes. This could be achieved
by passing `--availability=drain` to `docker swarm init`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the availability when initialising a swarm allows you to
initialize a dedicated manager node that do not get tasks scheduled.

The following example initializes a swarm, and sets the availability
to `drain`;

    $ docker swarm init --availability=drain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% satisfied with that one, so open to suggestions

@@ -94,6 +95,15 @@ This flag is generally not necessary when joining an existing swarm.

Secret value required for nodes to join the swarm

### `--availability string`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove string here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

### `--availability string`

This flag specifies the availability of the node at the time the node joins a master.
Possiblle availability value includes `active`, `pause`, or `drain`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible availability values are `active`, `pause`, or `drain`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch from 663e1d1 to 02b9334 Compare January 10, 2017 07:53
@yongtang
Copy link
Member Author

Thanks @aaronlehmann @thaJeztah the PR has been updated.

This fix tries to address the issue raised in 24596 where it was not
possible to join as manager only (`--availability=drain`).

This fix adds a new flag `--availability` to `swarm join`.

Related documentation has been updated.

An integration test has been added.

NOTE: Additional pull request for swarmkit and engine-api will
be created separately.

This fix fixes 24596.

Signed-off-by: Yong Tang <[email protected]>
This fix adds a new flag `--availability` to `swarm join`.

Related documentation has been updated.

An integration test has been added.

Signed-off-by: Yong Tang <[email protected]>
@yongtang yongtang force-pushed the 24596-swarm-join-with-drain branch from 02b9334 to 0f30c64 Compare January 11, 2017 00:32
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot join Swarm as Manager only
10 participants