Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: the flag --sql-advertise-addr is incompletely designed #52266

Open
knz opened this issue Aug 3, 2020 · 8 comments
Open

cli: the flag --sql-advertise-addr is incompletely designed #52266

knz opened this issue Aug 3, 2020 · 8 comments
Labels
A-docs A-kv-server Relating to the KV-level RPC server A-server-networking Pertains to network addressing,routing,initialization C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security

Comments

@knz
Copy link
Contributor

knz commented Aug 3, 2020

Today the flag --sql-advertise-addr is offered for symmetry with --advertise-addr.
We're not too happy about the status quo. There is more work needed.

Background

The reason why this flag is at all necessary is because of the following specific combination:

  • if the cluster is multi-DC, the private address used to start up each node will be inadequate to establish the connection from another DC
  • for this reason, we also store a separate "public" address, informed by the operator via flag --advertise-addr for the RPC port; this is used internally e.g. by node-node connections
  • for symmetry with the RPC port we also implement --sql-advertise-addr for when the SQL listener is split from RPC, which we'd like to make security best practice.
  • (and we're also implementing a separate tenant advertise addr when the tenant server is split off in a multi-tenant config)

Pros of the feature

  • in the node descriptor (roachpb.NodeDescriptor): all advertised addresses are stored there.

  • the RPC adv addr (not SQL) is pulled from node descs for node-node connections. This is necessary for normal cluster operation.

  • inside SQL, the SQL adv addr is used to populate the vtable crdb_internal.node_build_info.

    This might be used by 3rd party automation to discover the addresses of SQL servers, e.g. to automate setting up load balancers. We're not sure (it's also not documented).

  • cockroach debug zip needs to establish SQL connections to every node in the cluster, in addition to RPC connections; so even if debug zip is ran on one of the node's VMs, it may still need to reach out to other DCs using the nodes' public address.

    Currently debug zip discovers the RPC and SQL public address by looking at the node descriptor on the 1st node it connects to. That's necessary for proper operation of debug zip but then will drop after Ability to obtain debug.zip directly via http endpoint #51008 is addressed

So in summary while a separate RPC adv address is necessary for proper operation, the utility of a separate SQL adv address is relatively small.

Cons of the feature

  • users/customers come along and ask "How do I configure this properly?" and it's creating support burden.
  • the feature is only relevant for clusters that split the RPC and SQL listeners onto separate TCP addresses/ports. This is not the default config in CockroachDB.
  • SQL client apps do not need an "advertised address" to be configured server-side. SQL apps learn the address of the CockroachDB nodes via the postgres connection URL, and/or the load balancer.
  • we don't even use the advertise SQL addr in cockroach gen haproxy and therefore our own recommended configuration generator for load balancers don't use this feature properly

Ways forward

There are two possible ways forward, with different action items:

  • remove the concept:

    1. address Ability to obtain debug.zip directly via http endpoint #51008 and ensure that all the debug zip logic (that becomes server-side) operates using RPCs, without the need for node-to-node SQL connections.
    2. explain in docs that SQL load balancers must be configured using the public addresses, and add a check in the cockroach gen haproxy command that all nodes are reachable using the selected address
    3. deprecate the flag --sql-advertise-addr and eventually remove it.

    (Note: in this approach, it would still be possible to split RPC and SQL listeners, it's just that we wouldn't need to configure or store an advertised SQL address server-side)

OR

  • double down on the feature:

    1. make cockroach gen haproxy use the advertised SQL address
    2. document how to use separate SQL and RPC listeners by default (and maybe make this default config)
    3. explain in docs how the "advertised SQL address" is important and why, and how to configure it

Note that in this second approach, until we have more comprehensive docs it is confusing to mention the option to users upfront; as it raises more questions than it answers. Therefore it may be a good idea to hide the flag until we have more comprehensive deployment docs.

Jira issue: CRDB-3971

@blathers-crl
Copy link

blathers-crl bot commented Aug 3, 2020

Hi @knz, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Aug 3, 2020
@knz knz changed the title cli: the flag --sql-advertise-addr is misdesigned cli: the flag --sql-advertise-addr is incompletely designed Aug 3, 2020
@knz knz added A-docs A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Aug 3, 2020
@knz
Copy link
Contributor Author

knz commented Aug 3, 2020

cc @bdarnell : do you think we should polish this further given the functional utility? Or would you rather drop this logic entirely? Or perhaps we'd be satisfied with an intermediate situation where we hide the flag from command-line --help until our docs explain how to use this more thoroughly?

cc @Amruta-Ranade, this is what we were discussing last week

knz added a commit to knz/cockroach that referenced this issue Aug 3, 2020
@bdarnell
Copy link
Contributor

bdarnell commented Aug 3, 2020

It's very low priority, but yes, I think we should expand on this functionality instead of removing it. There should also be an "advertise http addr" (which would also be useful to gen haproxy).

As for hiding it from --help in the meantime, I'm not a fan of hidden options unless the support burden is really high. This one is straightforward enough - if you're using both --sql-addr and --advertise-addr, you should probably also use --sql-advertise-addr, and set it using the same logic you used for --advertise-addr.

@jlinder jlinder added the T-server-and-security DB Server & Security label Jun 16, 2021
@knz knz added the A-server-networking Pertains to network addressing,routing,initialization label Jul 29, 2021
@a-robinson
Copy link
Contributor

I get that this is super low priority, but FWIW I ran into a surprising quirk with --sql-advertise-addr that I wanted to mention: --sql-advertise-addr is completely ignored if you don't also set --sql-addr alongside it.

I tried setting --sql-advertise-addr on a cluster purely in order to improve the output of cockroach node status and related debug endpoints so that they would include a useful identifying hostname that makes reading their output easier (the cluster is using IP addresses for its normal listen/advertise addr for reasons that are unimportant here). But found that it had zero effect, not even in the initial "CockroachDB node starting at ..." message that gets printed out when a node starts up. The non-sql --advertise-addr was instead used as the advertised SQL address.

And you can't cheekily work around it by setting --sql-addr to the same thing as --listen-addr, because cockroach reasonably assumes that you want it to bind to the two addresses separately, so it fails to bind both during startup.

Would y'all be open to a patch that allows --sql-advertise-addr to work even when --sql-addr isn't set or would you rather keep things as they are?

@knz
Copy link
Contributor Author

knz commented Dec 13, 2021

What you describe looks like a bug (and probably a separate one, deserving a separate issue). Thanks for finding it.

Would y'all be open to a patch that allows --sql-advertise-addr to work even when --sql-addr isn't set

Yes that seems like a good idea. If you plan to make a PR, send it my way.

@jtsiros @mwang1026 we will want to revive this, as a general item to make it easier to deploy multi-region. We'll need an improvement in this area if we want to expand our k8s operator capabilities towards multi-region (and possibly ditto in the CC infra).

@a-robinson
Copy link
Contributor

a-robinson commented Dec 13, 2021

Yes that seems like a good idea. If you plan to make a PR, send it my way.

I have no immediate plans to, but I may come back to it in a while if I get bored and it doesn't otherwise get fixed.

craig bot pushed a commit that referenced this issue Sep 8, 2022
87027: streamingccl: reduce server count in multinode tests r=samiskin a=samiskin

While these tests would pass under stress locally they would fail CI
stress, which may be because we were starting more server processes than
ever before with 4 source nodes, 4 source tenant pods, and 4 destination
nodes.

This PR reduces the node count to 3 (any lower and scatter doesn't
correctly distribute ranges) and only starts a single tenant pod for the
source cluster.

Release justification: test-only change
Release note: None

87412: cli,server: fix --sql-advertise-addr when --sql-addr is not specified r=a-robinson,ajwerner a=knz

Fixes #87040.
Informs #52266.

cc `@a-robinson` 

Release justification: bug fix

Release note (bug fix): The flag `--sql-advertise-addr` now properly
works even when the SQL and RPC ports are shared (because `--sql-addr`
was not specified). Note that this port sharing is a deprecated
feature in v22.2.

87440: ui: update txn contention insights to use waiting txns as event r=ericharmeling a=ericharmeling

This commit updates the transaction workload insights pages to use the waiting
contended transaction as the primary contention event, rather than the blocking
transaction.

Fixes #87284.

https://www.loom.com/share/383fec4297a74ec79d90e46f11def792

Release justification: bug fixes and low-risk updates to new functionality
Release note: None

87462: upgrade/upgrades: allow CreatedAtNanos to be set when validating migration r=ajwerner a=ajwerner

Schema change upgrade migrations to system tables are made idempotent by checking that the descriptor reaches some expected state. In order to ensure that it is in that expected state, some volatile fields need to be masked. We forgot to mask CreatedAtNanos. We also lost the testing which came with these helper functions we use.

The vast majority of this PR is reviving testing from #66889.

Fixes #85228.

Release justification: Import bug fix for backport

Release note (bug fix): Some upgrade migrations perform schema changes on system tables. Those upgrades which added indexes could, in some cases, get caught retrying because they failed to detect that the migration had already occurred due to the existence of a populated field. When that happens, the finalization of the new version can hang indefinitely and require manual intervention. This bug has been fixed.

Co-authored-by: Shiranka Miskin <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Eric Harmeling <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@sergeymavrin
Copy link

sergeymavrin commented Mar 12, 2024

Same here. Have next configuration

--listen-addr=172.24.128.41:26257
--sql-addr=172.24.128.41:26258
--http-addr=172.24.128.41:8080

have no advertise options and on command cockroach gen haproxy got wrong config:

server cockroach1 172.24.128.41:26257 check port 8080
server cockroach2 172.24.128.43:26257 check port 8080
server cockroach3 172.24.128.42:26257 check port 8080

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs A-kv-server Relating to the KV-level RPC server A-server-networking Pertains to network addressing,routing,initialization C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-server-and-security DB Server & Security
Projects
None yet
Development

No branches or pull requests

5 participants