Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locality-advertise-addr is not wokring #42741

Open
leomkkwan opened this issue Nov 25, 2019 · 16 comments
Open

locality-advertise-addr is not wokring #42741

leomkkwan opened this issue Nov 25, 2019 · 16 comments
Labels
A-server-architecture Relates to the internal APIs and src org for server code A-server-networking Pertains to network addressing,routing,initialization C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community T-server-and-security DB Server & Security X-nostale Marks an issue/pr that should be ignored by the stale bot

Comments

@leomkkwan
Copy link

leomkkwan commented Nov 25, 2019

Running a 9 nodes cluster on GCP with 19.2.0.

./cockroach start --cache=25% --max-sql-memory=35% --background --locality=cloud=gcp,region=us-east1,datacenter=us-east1-c --store=path=/mnt/d1,attrs=ssd,size=90% --log-dir=log --certs-dir=certs --max-disk-temp-storage=100GB --locality-advertise-addr=cloud=gcp@{Private IP},region=us-east1@{Private IP},datacenter=us-east1-c@{Private IP} --join={N1 Private IP},{N2 Private IP},{Nx Prive IP} --advertise-addr={Public IP}

Start all nodes, and it looks like all nodes are healthy
N4

However, in the network diagnostics pages
N5

Confirmed that all nodes are in the same region
n2

If I shutdown the cluster and restart, on the network diagnostics pages it will become
N3

On the problematic node, there will be spam with these log entries

W191125 17:58:47.883009 19657 vendor/google.golang.org/grpc/clientconn.go:1206 grpc: addrConn.createTransport failed to connect to {{Public IP N3}:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp {Public IP N3}:26257: i/o timeout". Reconnecting... I191125 17:58:48.663426 20622 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n4] circuitbreaker: gossip [::]:26257->{Public IP N9}:26257 tripped: initial connection heartbeat failed: operation "rpc heartbeat" timed out after 6s: rpc error: code = DeadlineExceeded desc = context deadline exceeded I191125 17:58:48.663437 20622 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n4] circuitbreaker: gossip [::]:26257->{Public IP N9}:26257 event: BreakerTripped W191125 17:58:48.883192 19657 vendor/google.golang.org/grpc/clientconn.go:1206 grpc: addrConn.createTransport failed to connect to {{Public IP N3}:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting... I191125 17:58:52.045207 187 server/status/runtime.go:498 [n4] runtime stats: 5.0 GiB RSS, 363 goroutines, 174 MiB/60 MiB/271 MiB GO alloc/idle/total, 4.1 GiB/4.8 GiB CGO alloc/total, 91.6 CGO/sec, 14.8/0.8 %(u/s)time, 0.0 %gc (1x), 606 KiB/456 KiB (r/w)net W191125 17:58:52.057512 182 server/node.go:745 [n4] [n4,s4]: unable to compute metrics: [n4,s4]: system config not yet available W191125 17:58:52.217886 161 storage/replica_range_lease.go:554 can't determine lease status due to node liveness error: node not in the liveness table github.com/cockroachdb/cockroach/pkg/storage.init.ializers /go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:44 runtime.main /usr/local/go/src/runtime/proc.go:188 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337 W191125 17:58:57.217893 162 storage/replica_range_lease.go:554 can't determine lease status due to node liveness error: node not in the liveness table github.com/cockroachdb/cockroach/pkg/storage.init.ializers /go/src/github.com/cockroachdb/cockroach/pkg/storage/node_liveness.go:44 runtime.main /usr/local/go/src/runtime/proc.go:188 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337 W191125 17:58:58.008692 20241 vendor/google.golang.org/grpc/clientconn.go:1206 grpc: addrConn.createTransport failed to connect to {{Public IP N2}:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp {Public IP N2}:26257: i/o timeout". Reconnecting... I191125 17:58:58.361063 19445 storage/store_snapshot.go:978 [n4,raftsnapshot,s4,r262/3:/Table/60/2/"5{9aaca…-b073e…}] sending LEARNER snapshot fcabe123 at applied index 2404159 I191125 17:58:58.517305 155 storage/store_remove_replica.go:129 [n4,s4,r262/3:/Table/60/2/"5{9aaca…-b073e…}] removing replica r262/3 W191125 17:58:59.008852 20241 vendor/google.golang.org/grpc/clientconn.go:1206 grpc: addrConn.createTransport failed to connect to {{Public IP N2}:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing cannot reuse client connection". Reconnecting... W191125 17:58:59.008859 20391 vendor/google.golang.org/grpc/clientconn.go:1206 grpc: addrConn.createTransport failed to connect to {{Public IP N8}:26257 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp {Public IP N8}:26257: i/o timeout". Reconnecting... I191125 17:58:59.010597 21293 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:322 [n4] circuitbreaker: gossip [::]:26257->{Public IP N3}:26257 tripped: initial connection heartbeat failed: operation "rpc heartbeat" timed out after 6s: rpc error: code = DeadlineExceeded desc = context deadline exceeded I191125 17:58:59.010610 21293 vendor/github.com/cockroachdb/circuitbreaker/circuitbreaker.go:447 [n4] circuitbreaker: gossip [::]:26257->{Public IP N3}:26257 event: BreakerTripped

If I start all nodes with --advertise-addr={Private IP}, everything back to normal.

Jira issue: CRDB-5327

@bdarnell
Copy link
Contributor

I suspect that this may be related to the gossip bootstrap persistence feature:

added := g.maybeAddBootstrapAddressLocked(desc.Address, desc.NodeID)

This uses desc.Addr which is the primary address for the node without considering localities. But the persisted info is merged with the --join flag so it should be able to self-heal and I'm not sure why it's not. Maybe there's something else that's missing the locality-aware lookup.

This feature was built with the assumption that the primary/public address for the node would be reachable from anywhere; the locality-specific address would just be an optimization. We haven't done much testing in cases where the primary address is sometimes unreachable and so it's possible that we're relying on the primary address in some bootstrapping cases (or maybe it's not just bootstrapping, which would be a more significant bug in this feature).

It looks like you're (intentionally) in a single region/AZ for now, but what is your plan when you go to multiple regions? Will there be a public IP that works across regions or will you be using multiple private IPs? Assuming the former is your goal (which is what we usually see), you'll need to adjust your firewall to get there, and making that adjustment now should get things working (although we'll need to confirm that after bootstrapping it is transitioning onto the more efficient private IPs).

--locality-advertise-addr=cloud=gcp@{Private IP},region=us-east1@{Private IP},datacenter=us-east1-c@{Private IP}

This is redundant - it's a list of rules with first-match-wins, so you only want to specify the level that determines access to the private IP (typically region=us-east1). This single-match limitation also means that you may want to label it region=gcp-us-east1 to guard against region name collisions if you ever span multiple clouds.

@steeling
Copy link

+1 we're experiencing this issue as well. Similar setup where our default advertise-addr is used for external clusters, and we set a locality-advertise-addr for in-cluster use.

We're attempting to connect with istio + ingress/egress for multicluster connection, so not having the loop back from internal usage would be nice.

We also may add/connect clusters at will, so it's nice to default to a globally available address for all localities, except the local one.

@rleiwang
Copy link

+1
I am experiencing this issue too. Deployed 3 nodes cluster on GKE through helm stable/cockcorachdb chart. This cluster is for internal use only, no ingress.

the deployment yaml is as following.

      - args:
        - shell
        - -ecx
        - exec /cockroach/cockroach start --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257
          --advertise-host=$(hostname).${STATEFULSET_FQDN} --logtostderr=INFO --insecure
          --http-port=8080 --port=26257 --cache=25% --max-disk-temp-storage=0 --max-offset=500ms
          --max-sql-memory=25%
        env:
        - name: STATEFULSET_NAME
          value: bw-cockroachdb
        - name: STATEFULSET_FQDN
          value: bw-cockroachdb.demo.svc.cluster.local
        - name: COCKROACH_CHANNEL
          value: kubernetes-helm
        image: cockroachdb/cockroach:v19.2.2

@steeling
Copy link

Looked at the code briefly, and I'm wondering if it's the gossip protocol itself that is the issue here.

If I have nodes A, B, C with flags

--advertise-addr="$(hostname -f)" --locality=abc --advertise-locality-addr="xyz@${ORDINAL}.mydomain.com"

Then after bootstrapping the nodes A, B, C they all share the local hostname.

Then I turn up 3 new nodes XYZ, and with a join flag pointing to node C. Wouldn't node C share
the addresses of A, B, C that it has, which are supposed to be unique to A, B, C?

Do you know where in the code the addresses of other nodes are shared specifically? Perusing the gossip package I didn't see it, unless it's treated as data and it requests it from the node's database directly via sql?

@mattcrdb
Copy link

Hi Steeling,

Is there a reason you're using --advertise-locality-addr? Are you able to use --advertise-addr=<public address>?

Thanks,
Matt

@steeling
Copy link

We run a multi cloud setup, over the public internet, so we give internal nodes the k8s service name, and external get a full hostname

@tbg
Copy link
Member

tbg commented Feb 20, 2020

Sorry about the radio silence. I looked into this and while I haven't been able to reproduce the problem yet, I wanted to share what I've tried as I might be missing a vital ingredient. Locally, I am starting a three node cluster, following in spirit what @steeling described here:

./cockroach start --insecure --logtostderr=INFO --background --advertise-addr=doesnotexist --locality region=abc [email protected]:26257

./cockroach start --insecure --logtostderr=INFO --background --advertise-addr=doesnotexist --locality region=abc [email protected]:26258 --join 127.0.0.1:26257 --store cockroach-data2 --http-addr :8081 --listen-addr :26258

./cockroach start --insecure --logtostderr=INFO --background --advertise-addr=doesnotexist --locality region=abc [email protected]:26259 --join 127.0.0.1:26257 --store cockroach-data3 --http-addr :8082 --listen-addr :26259

Note how the nodes all advertise an "unreachable" address, but advertise a usable one via --locality-advertise-addr. Note also how the latter two nodes join only to n1, so for n3 to be able to connect to n2, n1 necessarily has to share n2's locality-advertise-addr (as opposed to the bogus unreachable one).

The cluster will show up as green in the UI, and the latency page will work (note that this is on the 20.1-alpha, so the UI looks different, but it works just the same under the hood):

image

I restarted the cluster and brought it up with the same command line invocation, and it recovered just fine. I'm not even seeing any connection attempts to the bogus address.

One thing that's maybe silly - but I do want to point it out - is that the network latency page has some bug where it sometimes won't show complete results when it is first used right after a cluster comes up. It's unclear to me why that is, but just refreshing the page once does fix it for me. I think it has to do with the addresses taking a little bit of time to percolate between all of the nodes, and the latency page not working properly until that has happened. This doesn't however explain the log messages folks have been posting higher up in this thread.

@tbg
Copy link
Member

tbg commented Feb 20, 2020

I did however confirm that gossip bootstrap persistence is useless in this case, as all it does is write the bogus addresses down. This confirms @bdarnell's comment here and basically disables gossip bootstrap persistence in this example. However, we also see that at least in my setup, the --join flags are enough (as they ought to be; I've argued elsewhere that gossip bootstrap persistence should be removed)

I also echo @bdarnell's comment that this feature was built around the expectation that --advertise-addr is reachable by all nodes in the cluster (and that using the locality-advertised address is just an optimization). We see this through all of the Gossip code, for example

args := Request{
NodeID: g.NodeID.Get(),
Addr: g.mu.is.NodeAddr,
Delta: delta,
HighWaterStamps: g.mu.is.getHighWaterStamps(),
ClusterID: g.clusterID.Get(),
}

indicates that when a node gossips to another node, it will claim that the request originated from --advertise-addr, meaning that the recipient will be unaware of the "real" address at which the origin node can be reached, at least at the level of Gossip. To think of examples where there is a concrete problem is not trivial. The node descriptor (which contains the locality-aware addresses) is gossiped at an interval, so as long as gossip stabilizes (as it should using the join flags, if they're set up correctly), the locality addresses should become available to the code that uses them quickly (i.e. seconds).

In summary, I would love for someone to tweak my example to highlight the problem others here are experiencing.

@RoachietheSupportRoach
Copy link
Collaborator

Zendesk ticket #4692 has been linked to this issue.

@steeling
Copy link

steeling commented Mar 3, 2020

Ok, I was able to reproduce the issue. So the issue surfaces when we run the above configuration (which is typical for trying to join with multiclusters), in a more test-like environment with only a single cluster, and a non-existant hostname.

For example if we run --advertise-addr=my-fake-addr.com, then the whole thing fails to bootstrap itself, because it attempts to resolve these addresses (only on initial bootstrap it seems), even when they are not in the right locality

@stickenhoffen
Copy link

I just stumbled upon this. The node needs to be able to communicate with itself on whatever is specified with --advertise-addr. I added firewall rules in GCP to allow the node to connect to it's own external IP address.

@morgangallant
Copy link

What's the status of this issue? Is this still an error in the current release version? In order to cut down on some bandwidth costs, it would be great to use an internal address for a node in the same locality group, and fallback to a public address for communication between nodes in different locality groups.

Would appreciate any update here!

@knz knz added the C-investigation Further steps needed to qualify. C-label will change. label May 4, 2020
@knz
Copy link
Contributor

knz commented May 7, 2020

Hi Morgan, thanks for the request.
We understand the use case, and unfortunately the behavior of --locality-advertise-addr is not going to help much here (given the unfortunate restriction of --advertise-addr).

We will possibly look into this for v20.2. In the meantime, I would recommend you apply either of the following solutions, separate or in combination:

  • use VPC peering to make the private IP addresses of each node available from every other node. This way, your cloud routing layer will automatically select whether to bill traffic to local or cross-DC bandwidth.

  • use IP routing/firewalling rules in your OS to set up reverse-NAT: when the OS detect a node is attempting to connect to the internal address of a node that's in a different DC, it redirects the request to that other node's public address. There would need to be an inverse rule (port forwarding) on the other side.

Could you check if any of this is applicable in your environment?

I appreciate that these methods are slightly more complex to set up. That is why we are not losing sight of this limitation and still plan to improve CockroachDB accordingly.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz knz added X-nostale Marks an issue/pr that should be ignored by the stale bot and removed no-issue-activity labels Sep 19, 2023
@knz
Copy link
Contributor

knz commented Sep 19, 2023

still relevant

@daniel-crlabs
Copy link
Contributor

This came up again today, created a new docs issue since the one from 4 years ago has not been touched: https://cockroachlabs.atlassian.net/browse/DOC-9161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-server-architecture Relates to the internal APIs and src org for server code A-server-networking Pertains to network addressing,routing,initialization C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community T-server-and-security DB Server & Security X-nostale Marks an issue/pr that should be ignored by the stale bot
Projects
None yet
Development

No branches or pull requests