Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: the init process does not propagate through one-way --join connectivity #61621

Closed
knz opened this issue Mar 8, 2021 · 10 comments
Closed
Labels
A-cli-admin CLI commands that pertain to controlling and configuring nodes A-kv-gossip A-kv-server Relating to the KV-level RPC server C-question A question rather than an issue. No code/spec/doc change needed.

Comments

@knz
Copy link
Contributor

knz commented Mar 8, 2021

NB: This is not a new issue (it existed in 20.1) however it just caused me to spend a lot of time staring at my screen not understanding what was going on.

What I did:

  1. on node n1, cockroach start without --join (or a self-join)
  2. on node n2, cockroach start --join=n1

At this point I verified in logs that n2 was establishing a conn to n1, and n1 was receiving an init request / join request from n2.

  1. then I ran cockroach init --host=n2

That caused n2 to initialize a cluster centered on n2 (with node ID 1). However n1 does not participate and does not join, and remains uninitialized (no node ID).

This surprised me -- I thought that the join flag need not create a full connectivity graph for the init command to propagate throughout the established connections.

Was that the intended behavior?

cc @tbg @bdarnell @irfansharif

@knz knz added C-question A question rather than an issue. No code/spec/doc change needed. A-cli A-kv-gossip A-kv-server Relating to the KV-level RPC server labels Mar 8, 2021
@irfansharif
Copy link
Contributor

Yes, this is intended behavior. The expectation is that all nodes are started off with the same flags, usually pointing to a "seed set" of nodes to join to, and the cockroach init applied to nodes in that seed set. Typically that seed set is just n1.

TestClusterConnectivity demonstrates this expected behaviour: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/connectivity_test.go#L40-L44

For the code that actually does this, see: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/init.go#L231-L232

Basically each node either waits to get init-ed, or waits for a successful join request sent out to the nodes it was informed about. We don't do anything fancier like making sure a join request triggers the server to start sending out reciprocal join requests (and promulgate the "connectivity" I think you're getting at).

@knz
Copy link
Contributor Author

knz commented Mar 8, 2021

I guess my question wasn't clear.
I'm pretty sure I agree that this is what the code does (obviously) and so the person who wrote the code did not see this use case as a potential edge case.

However I wonder if this is intended - do we really want to let the user wonder for ages that they forgot to include some node in their --join flag?

We have explained in docs before that it's ok when --join is not exhaustive, for example it can point to a load balancer so that it only hits one node at a time the first time around, and then gossip takes care of sharing additional node addresses.

However, because of the situation explained above, having a non-exhaustive --join actually can break cluster initialization. I don't know that this is what was intended wrt UX.

@irfansharif
Copy link
Contributor

We knew about this potential loss of bidirectionality as of #32574 (comment) and chose not to do anything about it. We certainly could.

@knz
Copy link
Contributor Author

knz commented Mar 8, 2021

I'm vaguely OK with the current behavior but it irks me that we are not consistent about it in docs tutorials etc.

I also think that in an orchestration setting, the current semantics are a minefield if the user is trying to initialize a cluster across multiple regions (where it's unusual to have a discrete list of peers in other regions).

I think we can only preserve the current behavior if:

  1. we also discourage the creation of multi-node clusters (i.e. run init after multiple nodes have started) across different networks in docs

  2. we extend the CLI interface to force the admin to indicate the expected number of peers pre-init, so that the nodes can properly complain in their log if they receive an init request before they have full bidirectional connectivity with the number of expected peers.

NB: the 2nd point is congruent with the expected semantics of the new connect command so it's not a stretch.

@knz
Copy link
Contributor Author

knz commented Mar 8, 2021

@itsbilal can you bring this issue for discussion on this week's server meeting notes, and try to ping ben to get some feedback on it?

@bdarnell
Copy link
Contributor

bdarnell commented Mar 8, 2021

on node n1, cockroach start without --join (or a self-join)

This must be a self-join: if it's started without --join it will self-init.

We have explained in docs before that it's ok when --join is not exhaustive, for example it can point to a load balancer so that it only hits one node at a time the first time around, and then gossip takes care of sharing additional node addresses.

I'm not sure if the load balancer scenario is really supported - it was something we had explicit support for in the early days (you had to tell the system it was a load balancer instead of a specific node), but that was removed. It's now expected that you name one or more nodes directly.

It has been known for a long time that the best practice is to pick 3-5 nodes as join targets and list those same nodes in the join flags on all nodes. I think it's reasonable to require that the init command be sent to one of the join targets, although I'm not sure whether that's been explicitly noticed or documented before.

@irfansharif
Copy link
Contributor

if it's started without --join it will self-init.

That behaviour was deprecated + removed. #51245

@knz
Copy link
Contributor Author

knz commented Mar 9, 2021

I think it's reasonable to require that the init command be sent to one of the join targets, although I'm not sure whether that's been explicitly noticed or documented before.

Ah that is a very good and succinct summary of the new requirement. Maybe that's what we need to explain in docs tutorials etc.

@taroface what is best here? Should I file an issue on the docs repo?

@taroface
Copy link
Collaborator

taroface commented Mar 9, 2021

@knz A docs issue would be great. Feel free to cc me and I can route it to the right person (possibly myself).

@knz knz added A-cli-admin CLI commands that pertain to controlling and configuring nodes and removed A-cli labels Mar 20, 2021
@knz
Copy link
Contributor Author

knz commented Apr 15, 2021

Closing in favor of cockroachdb/docs#10342

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli-admin CLI commands that pertain to controlling and configuring nodes A-kv-gossip A-kv-server Relating to the KV-level RPC server C-question A question rather than an issue. No code/spec/doc change needed.
Projects
None yet
Development

No branches or pull requests

4 participants