-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: use RPC address and not serf after initial Consul discovery #16217
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tgross
force-pushed
the
client-discover-by-members-rpc
branch
from
February 17, 2023 20:32
c1b0c79
to
3c094d3
Compare
This was referenced Feb 17, 2023
tgross
added a commit
that referenced
this pull request
Feb 17, 2023
Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet we have provisioned in each AZ. Revise security groups as follows: * Split out client security groups from servers so that we can't have clients accidentally accessing serf addresses or other unexpected cross-talk. * Add new security groups for the secondary subnet that only allows communication within the security group so we can exercise behaviors with multiple IPs. This changeset doesn't include any Nomad configuration changes needed to take advantage of the extra network interface. I'll include those with testing for PR #16217.
jrasell
pushed a commit
that referenced
this pull request
Feb 20, 2023
Add an Elastic Network Interface (ENI) to each Linux host, on a secondary subnet we have provisioned in each AZ. Revise security groups as follows: * Split out client security groups from servers so that we can't have clients accidentally accessing serf addresses or other unexpected cross-talk. * Add new security groups for the secondary subnet that only allows communication within the security group so we can exercise behaviors with multiple IPs. This changeset doesn't include any Nomad configuration changes needed to take advantage of the extra network interface. I'll include those with testing for PR #16217.
Nomad servers can advertise independent IP addresses for `serf` and `rpc`. Somewhat unexpectedly, the `serf` address is also used for both Serf and server-to-server RPC communication (including Raft RPC). The address advertised for `rpc` is only used for client-to-server RPC. This split was introduced intentionally in Nomad 0.8. When clients are using Consul discovery for connecting to servers, they get an initial discovery set from Consul and use the correct `rpc` tag in Consul to get a list of adddresses for servers. The client then makes a `Status.Peers` RPC to get the list of those servers that are raft peers. But this endpoint is shared between servers and clients, and provides the address used for Raft. Most of the time this is harmless because servers will bind on 0.0.0.0 anyways., But in topologies where servers are on a private network and clients are on separate subnets (or even public subnets), clients will make initial contact with the server to get the list of peers but then populate their local server set with unreachable addresses. Cluster administrators can work around this problem by using `server_join` with specific IP addresses (or DNS names), because the `Node.UpdateStatus` endpoint returns the correct set of RPC addresses when updating the node. So once a client has registered, it will get the correct set of RPC addresses. This changeset updates the client logic to query `Status.Members` instead of `Status.Peers`, and then extract the correctly advertised address and port from the response body.
tgross
added
backport/1.3.x
backport to 1.3.x release line
backport/1.4.x
backport to 1.4.x release line
labels
Feb 28, 2023
tgross
force-pushed
the
client-discover-by-members-rpc
branch
from
February 28, 2023 21:55
3c094d3
to
0bf6e65
Compare
lgfa29
approved these changes
Mar 1, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This was referenced Mar 2, 2023
tgross
added a commit
that referenced
this pull request
Mar 14, 2023
In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Create a new `Status.RPCServers` endpoint that mirrors the `Status.Peers` endpoint but provides the RPC server addresses instead of the Raft addresses. This fixes the authentication bug but also ensures we're only registering with servers in the client's region and not in any other servers that might have registered with Consul. This changeset also expands the test coverage of the RPC endpoint and closes up potential holes in the `ResolveACL` method that aren't currently bugs but easily could become bugs if we called the method without ensuring its invariants are upheld. Co-authored-by: tantra35 <[email protected]>
2 tasks
tgross
added a commit
that referenced
this pull request
Mar 16, 2023
In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s.
This was referenced Mar 16, 2023
Merged
tgross
added a commit
that referenced
this pull request
Mar 20, 2023
In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s.
tgross
added a commit
that referenced
this pull request
Mar 20, 2023
In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s.
tgross
added a commit
that referenced
this pull request
Mar 20, 2023
In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s. Co-authored-by: Tim Gross <[email protected]>
tgross
added a commit
that referenced
this pull request
Mar 20, 2023
In #16217 we switched clients using Consul discovery to the `Status.Members` endpoint for getting the list of servers so that we're using the correct address. This endpoint has an authorization gate, so this fails if the anonymous policy doesn't have `node:read`. We also can't check the `AuthToken` for the request for the client secret, because the client hasn't yet registered so the server doesn't have anything to compare against. Instead of hitting the `Status.Peers` or `Status.Members` RPC endpoint, use the Consul response directly. Update the `registerNode` method to handle the list of servers we get back in the response; if we get a "no servers" or "no path to region" response we'll kick off discovery again and retry immediately rather than waiting 15s. Co-authored-by: Tim Gross <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.3.x
backport to 1.3.x release line
backport/1.4.x
backport to 1.4.x release line
backport/1.5.x
backport to 1.5.x release line
theme/client
type/bug
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #16211
Nomad servers can advertise independent IP addresses for
serf
andrpc
. Somewhat unexpectedly, theserf
address is also used for both Serf and server-to-server RPC communication (including Raft RPC). The address advertised forrpc
is only used for client-to-server RPC. This split was introduced intentionally in Nomad 0.8.When clients are using Consul discovery for connecting to servers, they get an initial discovery set from Consul and use the correct
rpc
tag in Consul to get a list of adddresses for servers. The client then makes aStatus.Peers
RPC to get the list of those servers that are raft peers. But this endpoint is shared between servers and clients, and provides the address used for Raft.Most of the time this is harmless because servers will bind on 0.0.0.0 anyways., But in topologies where servers are on a private network and clients are on separate subnets (or even public subnets), clients will make initial contact with the server to get the list of peers but then populate their local server set with unreachable addresses.
Cluster administrators can work around this problem by using
server_join
with specific IP addresses (or DNS names), because theNode.UpdateStatus
endpoint returns the correct set of RPC addresses when updating the node. So once a client has registered, it will get the correct set of RPC addresses.This changeset updates the client logic to query
Status.Members
instead ofStatus.Peers
, and then extract the correctly advertised address and port from the response body.Notes:
server_join
workaround described above.consul
config block) and used AWS EC2 auto-join via tags and that works just fine as well.