Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] Add Unix domain socket support #16872

Merged
merged 1 commit into from
Oct 11, 2023
Merged

[api] Add Unix domain socket support #16872

merged 1 commit into from
Oct 11, 2023

Conversation

angrycub
Copy link
Contributor

@angrycub angrycub commented Apr 12, 2023

This PR is designed to pair with #16884 so that API clients can connect to the Nomad agent's HTTP api via a unix domain socket. It also includes support for the nomad operator api command and unix sockets.

To test this PR manually, you will need to use socat as a proxy to the Nomad API socket

nomad agent -dev
socat -d -d UNIX-LISTEN:/tmp/run/nomad.sock,reuseaddr,fork TCP4:127.0.0.1:4646

In another terminal

export NOMAD_ADDR=unix:///tmp/run/nomad.sock
nomad namespace list

Fixes: #18099
Fixes: #4637

@gudmundur
Copy link

I'm hoping that this change will work nicely with the Task HTTP API that recently got exposed. I nudged @mikenomitch on #4637 a little while back.

My use case is that we've both got shell invocations of nomad happening within an allocation, and also via the Go API client that you all provide. For the latter we can override the HTTP client and work around this that way, for the CLI use case we need to expose a series of environment variables along with mTLS certificates.

@angrycub angrycub marked this pull request as ready for review April 25, 2023 14:21
@angrycub
Copy link
Contributor Author

angrycub commented Apr 25, 2023

@gudmundur That's definitely a goal with this addition. I appreciate the extra information about your use case as well. The nice thing about this code is that it doesn't require PR #16884 to be useful and could be built and tested in your environment. If you do, feedback would be greatly appreciated.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

api/api.go Outdated Show resolved Hide resolved
command/operator_api_test.go Show resolved Hide resolved
command/operator_api_test.go Outdated Show resolved Hide resolved
api/api.go Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
command/operator_api.go Show resolved Hide resolved
command/operator_api.go Outdated Show resolved Hide resolved
command/operator_api.go Outdated Show resolved Hide resolved
@SamMousa
Copy link
Contributor

@angrycub are you willing to push this one over the finish line? Or should we look for someone else to do that?

@tgross
Copy link
Member

tgross commented Sep 26, 2023

@SamMousa the PR has been approved and just needs some mop-up work. Opening a new PR wouldn't be productive at this point @angrycub just needs to get some free time to wrap it up.

@SamMousa
Copy link
Contributor

Well, that is assuming he has the time to finish it.. Also, this PR makes a lot more changes and has a different approach.
Instead of creating a lot of custom code for handling a unix socket my PR just does 2 simple things:

  • Clean up the address parsing code to not prefix a scheme when a scheme is already present
  • Register the unix protocol inside the default http client

No need for custom properties like SocketPath, which then introduces the question of priority or increases the likelyhood of invalid configuration (what if both Address and Socketpath are present).

Also it doesn't alter the struct or require a new environment variable.

Anyway, either fix is fine for me, my only goal is to get access to this feature...

Comment on lines +209 to 218
// URL returns a copy of the initial parsed address and is not modified in the
// case of a `unix://` URL, as opposed to Address.
func (c *Config) URL() *url.URL {
return c.url
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to other reviewers: @angrycub and I paired on this for quite a while. The code here in the api package handles the unix:// scheme correctly now without doing a bunch of weird string manipulation. But the nomad operator api package has some special case logic around emitting curl-alike outputs. Rather than exposing flags or fields special-casing the UDS, we landed on this method, which is potentially more generally useful for callers and lets the nomad operator api check the scheme directly.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the merge conflict is cleaned up.

- Expose internal HTTP client's Do() via Raw
- Use URL parser to identify scheme
- Align more with curl output
- Add changelog
- Fix test failure; add tests for socket envvars
- Apply review feedback for tests
- Consolidate address parsing
- Address feedback from code reviews

Co-authored-by: Tim Gross <[email protected]>
@angrycub angrycub merged commit 7266d26 into main Oct 11, 2023
21 of 22 checks passed
@angrycub angrycub deleted the f-api-uds branch October 11, 2023 15:04
tgross added a commit that referenced this pull request Aug 9, 2024
In #16872 we added support for configuring the API client with a unix domain
socket. In order to set the host correctly, we parse the address before mutating
the Address field in the configuration. But this prevents the configuration from
being reused across multiple clients, as the next time we parse the address it
will no longer be pointing to the socket. This breaks consumers like the
autoscaler, which reuse the API config between plugins.

Update the `NewClient` constructor to only override the `url` field if it hasn't
already been parsed. Include a test demonstrating safe reuse with a unix domain
socket.

Ref: hashicorp/nomad-autoscaler#944
Ref: hashicorp/nomad-autoscaler#945
tgross added a commit that referenced this pull request Aug 9, 2024
In #16872 we added support for configuring the API client with a unix domain
socket. In order to set the host correctly, we parse the address before mutating
the Address field in the configuration. But this prevents the configuration from
being reused across multiple clients, as the next time we parse the address it
will no longer be pointing to the socket. This breaks consumers like the
autoscaler, which reuse the API config between plugins.

Update the `NewClient` constructor to only override the `url` field if it hasn't
already been parsed. Include a test demonstrating safe reuse with a unix domain
socket.

Ref: hashicorp/nomad-autoscaler#944
Ref: hashicorp/nomad-autoscaler#945
tgross added a commit that referenced this pull request Dec 10, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address. In #23785 we fixed a bug where if the
configuration was used across multiple clients that mutation would happen
multiple times and the address would be incorrectly parsed.

When making `alloc log` or `alloc exec` calls to a region where the region is
not "global", we create a new client from the same configuration and then set
the address. But in this case we copy the private `url` field and that causes
the URL parsing to be skipped for the new client. This results in the region
always being set to the string literal `global` (because of mTLS handling code
introduced all the way back in 4d3b75d), which fails with an error "no path
to region" when the cluster isn't non-global and requests are sent to a
non-leader.

The "right" way of fixing this would be for `ClientConfig` not to change the
region to global in the first place, but as this is a public API and extremely
longstanding behavior, it could potentially be a breaking change for some
downstream consumers. Instead, we'll avoid copying the private `url` field so
that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 10, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 10, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 10, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 16, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 16, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 16, 2024
In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d
tgross added a commit that referenced this pull request Dec 16, 2024
…ddress (#24644) (#24682)

In #16872 we added support for unix domain sockets, but this required mutating
the `Config` when parsing the address so as to remove the port number. In #23785
we fixed a bug where if the configuration was used across multiple clients that
mutation would happen multiple times and the address would be incorrectly
parsed.

When making `alloc log`, `alloc fs`, or `alloc exec` calls where we have
line-of-sight to the client, we attempt to make a HTTP API call directly to the
client node. So we create a new API client from the same configuration and then
set the address. But in this case we copy the private `url` field and that
causes the URL parsing to be skipped for the new client.

This results in the region always being set to the string literal
`"global"` (because of mTLS handling code introduced all the way back in
4d3b75d), unless the user has set the region specifically. This fails with
an error "no path to region" when the cluster isn't non-global and requests are
sent to a non-leader.

Arguably the "right" way of fixing this would be for `ClientConfig` not to
change the API client's region to `"global"` in the first place, but as this is
a public API and extremely longstanding behavior, it could potentially be a
breaking change for some downstream consumers. Instead, we'll avoid copying the
private `url` field so that the new address is re-parsed.

Fixes: #24635
Fixes: #24609
Ref: #16872
Ref: #23785
Ref: 4d3b75d

Co-authored-by: Tim Gross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support unix socket in client Client: support unix scheme in NOMAD_ADDR
5 participants