-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv,server: nodes started with different default replicaiton factors report inconsistent default zone configs in RPCs #39382
Comments
cc @tbg @jeffrey-xiao can you have a look? something tells me you've looked at this last. |
Maybe it's a bit more than that though. IT appears that the in-memory zone cfg is also used in
and also in @tbg: can you explain to me what was the rationale to use the global-scope, in-memory zone config as input to various things, as opposed to storing it in a system config range and fetching it from there on demand? A side question: does the current design in turn mandates that the default config be consistent across all nodes? What bad things would happen if they are not? |
discussed offline: the The two adminServer methods identified initially above are not going through the regular lookup logic, so these need to be corrected. It remains unclear (to me) what happens in other places when the cfg vars are updated. I haven't yet found a common implementation for this fallback logic. It seems that different places have their different methods to access zone configs and fall backs in diverse ways. There's no clarity (to me) that all the variants of the fall-back logic go through zone ID 0 and look it up from KV. |
I'll side-step the issue (and prevent it from becoming a real problem) by using the new SQL syntax from #39404 to initialize the replication factor using SQL, without changing the default zone config. However the current (mis)design will remain and this issue will remain relevant. |
I think it really all just boils down to poor design (when I say that I was usually involved, but I don't think I was this time, so I'm not too sure). It was possibly a mistake to have an in-memory fallback instead of "just" handling the case in which no zone information is available yet whenever zones are involved. I don't think you need to know which zone you're in to start serving commands, so I don't think starting the cluster needs the fallback. |
As established in cockroachdb#39382 it is not safe to modify the global default zone config objects to non-standard values. This commit changes `start-single-node` and `demo` to use zone configs via SQL instead. Also as requested by @piyush-singh it now informs the user the replication was disabled, for example: ``` * * INFO: Replication was disabled for this cluster. * When/if adding nodes in the future, update zone configurations to increase the replication factor. * ``` Release note: None
As established in cockroachdb#39382 it is not safe to modify the global default zone config objects to non-standard values. This commit changes `start-single-node` and `demo` to use zone configs via SQL instead. Also as requested by @piyush-singh it now informs the user the replication was disabled, for example: ``` * * INFO: Replication was disabled for this cluster. * When/if adding nodes in the future, update zone configurations to increase the replication factor. * ``` Release note: None
As established in cockroachdb#39382 it is not safe to modify the global default zone config objects to non-standard values. This commit changes `start-single-node` and `demo` to use zone configs via SQL instead. Also as requested by @piyush-singh it now informs the user the replication was disabled, for example: ``` * * INFO: Replication was disabled for this cluster. * When/if adding nodes in the future, update zone configurations to increase the replication factor. * ``` Release note: None
39492: cli: use zone configs to disable replication r=knz a=knz Fixes #39379. As established in #39382 it is not safe to modify the global default zone config objects to non-standard values. This commit changes `start-single-node` and `demo` to use zone configs via SQL instead. Also as requested by @piyush-singh it now informs the user the replication was disabled, for example: ``` * * INFO: Replication was disabled for this cluster. * When/if adding nodes in the future, update zone configurations to increase the replication factor. * ``` Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
@irfansharif if you're trying to make something that works well with multitenant, you may want to take this issue into account as well. |
Found while investigating #39379: it's possible to start two or more nodes that have different default zone configs from the perspective of the admin RPC.
Specifically
(s *adminServer) TableDetails()
and(s *adminServer) DatabaseDetails()
.To see this in action I can do this:
Then if I create a table/range on n1 it will get replication factor 1, and if I create one on n2 it will get replication factor 3 when inspecting the table details.
That's no good!
The correct behavior is to scan the
system.zones
and grab the default zone config from there, instead of grabbing it from the in-memory serverConfig
object.Jira issue: CRDB-5580
The text was updated successfully, but these errors were encountered: