-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: the init
process does not propagate through one-way --join
connectivity
#61621
Comments
Yes, this is intended behavior. The expectation is that all nodes are started off with the same flags, usually pointing to a "seed set" of nodes to join to, and the TestClusterConnectivity demonstrates this expected behaviour: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/connectivity_test.go#L40-L44 For the code that actually does this, see: https://github.com/cockroachdb/cockroach/blob/master/pkg/server/init.go#L231-L232 Basically each node either waits to get init-ed, or waits for a successful join request sent out to the nodes it was informed about. We don't do anything fancier like making sure a join request triggers the server to start sending out reciprocal join requests (and promulgate the "connectivity" I think you're getting at). |
I guess my question wasn't clear. However I wonder if this is intended - do we really want to let the user wonder for ages that they forgot to include some node in their We have explained in docs before that it's ok when However, because of the situation explained above, having a non-exhaustive |
We knew about this potential loss of bidirectionality as of #32574 (comment) and chose not to do anything about it. We certainly could. |
I'm vaguely OK with the current behavior but it irks me that we are not consistent about it in docs tutorials etc. I also think that in an orchestration setting, the current semantics are a minefield if the user is trying to initialize a cluster across multiple regions (where it's unusual to have a discrete list of peers in other regions). I think we can only preserve the current behavior if:
NB: the 2nd point is congruent with the expected semantics of the new |
@itsbilal can you bring this issue for discussion on this week's server meeting notes, and try to ping ben to get some feedback on it? |
This must be a self-join: if it's started without
I'm not sure if the load balancer scenario is really supported - it was something we had explicit support for in the early days (you had to tell the system it was a load balancer instead of a specific node), but that was removed. It's now expected that you name one or more nodes directly. It has been known for a long time that the best practice is to pick 3-5 nodes as join targets and list those same nodes in the join flags on all nodes. I think it's reasonable to require that the init command be sent to one of the join targets, although I'm not sure whether that's been explicitly noticed or documented before. |
That behaviour was deprecated + removed. #51245 |
Ah that is a very good and succinct summary of the new requirement. Maybe that's what we need to explain in docs tutorials etc. @taroface what is best here? Should I file an issue on the docs repo? |
@knz A docs issue would be great. Feel free to cc me and I can route it to the right person (possibly myself). |
Closing in favor of cockroachdb/docs#10342 |
NB: This is not a new issue (it existed in 20.1) however it just caused me to spend a lot of time staring at my screen not understanding what was going on.
What I did:
cockroach start
without--join
(or a self-join)cockroach start --join=n1
At this point I verified in logs that n2 was establishing a conn to n1, and n1 was receiving an init request / join request from n2.
cockroach init --host=n2
That caused n2 to initialize a cluster centered on n2 (with node ID 1). However n1 does not participate and does not join, and remains uninitialized (no node ID).
This surprised me -- I thought that the join flag need not create a full connectivity graph for the init command to propagate throughout the established connections.
Was that the intended behavior?
cc @tbg @bdarnell @irfansharif
The text was updated successfully, but these errors were encountered: