Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[agent] Serve API over Unix domain sockets #16884

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

[agent] Serve API over Unix domain sockets #16884

wants to merge 4 commits into from

Conversation

angrycub
Copy link
Contributor

@angrycub angrycub commented Apr 13, 2023

This PR pairs with #16872 to provide the server side configuration and implementation for serving the Nomad API over unix domain sockets.

For local testing, you can use socat to provide a suitable proxy

nomad agent -dev -api-socket-path /tmp/run/nomad.sock &
socat -d -d TCP4-LISTEN:14646,reuseaddr,fork UNIX:/tmp/run/nomad.sock

In another terminal

export NOMAD_ADDR=http://127.0.0.1:14646
nomad namespace list

Contents:

  • Add api_socket config
  • Update tests for new config
  • Enable socket listener
  • Add flags to agent for socket config

Fixes: #1639

@angrycub angrycub marked this pull request as ready for review April 17, 2023 22:04
@angrycub angrycub changed the title [server] Serve API over Unix domain sockets [agent] Serve API over Unix domain sockets Apr 17, 2023
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't align directly with Consul or Vault, but I think that's ok:

  • Consul makes you choose between TCP or Unix, and I don't think we want to do that. They also have a distinct unix_socket{} block anyway, so we're not far off.
  • We match Vault in all the ways that matter. Vault just has that neat listener{} block to reuse which we lack. api_listener{} blocks probably would have been a better choice for Nomad than addresses{}, but that ship sailed ages ago.

}

const (
MaxSocketPathLength = 104
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This varies depending on OS. So I think we should just try to create the socket and return the awful OS error if we fail.

If Go can find the sun_path length for the OS its on, that would be amazing. I think even then we need to be careful to subtract 1 to leave space for a null terminator: otherwise the man page implies no one may be able to connect to it?! What a mess. https://man7.org/linux/man-pages/man7/unix.7.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like syscall exposes it: https://pkg.go.dev/syscall#RawSockaddrUnix

Still might need to subtract 1 to leave room for a null terminator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually maybe you already peeked because it looks like 104 is the floor!

Peeking at what Go sets across platforms:

go/src/syscall$ rg -A3 RawSockaddrUnix | rg Path.*int | awk '{print $3}' | sort -u
[1023]uint8
[104]int8
[108]int8
[UNIX_PATH_MAX]int8

Somehow AIX manages to be the nice one here with 1023. What a world. UNIX_PATH_MAX is Windows only and 108.

Although I'm not sure if that takes the null terminator weirdness into account? So maybe we should set this to 103?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd peeked, but I think you're right about the null terminator. I think we can go for something more OS-specific by changing that line to:

			MaxSocketPathLength = len(syscall.RawSockaddrUnix{}.Path) - 1 // remove 1 byte for null terminator

@@ -193,6 +194,58 @@ func NewHTTPServers(agent *Agent, config *Config) ([]*HTTPServer, error) {
srvs = append(srvs, srv)
}

// Initialize the API socket if configured
startAPISocket := func() {
Copy link
Member

@schmichael schmichael Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a top level func that takes a config and returns an error just because this func is already so huge and only readable because it's the same code over and over with slight variations. nbd though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll drop a TODO here so that I can come back to is but it doesn't stall the world.

command/agent/http.go Show resolved Hide resolved
@@ -92,6 +92,10 @@ type Config struct {
// Use normalizedAddrs if you need the host+port to bind to.
Addresses *Addresses `hcl:"addresses"`

// APISocket is used to configure a unix domain socket listener for the
// HTTP API
APISocket *SocketConfig `hcl:"api_socket"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pedant in me wants to name this api_unix_socket because it's the "unix" bit that makes this interesting. All addresses{} are all sockets. Using unix in the name would match Consul and Vault better as well.

That being said if you don't think anyone wants to type that many characters I will shutup. It is nbd. 😅

Comment on lines +127 to +130
flags.StringVar(&cmdConfig.APISocket.Path, "api-socket-path", "", "")
flags.StringVar(&cmdConfig.APISocket.User, "api-socket-user", "", "")
flags.StringVar(&cmdConfig.APISocket.Group, "api-socket-group", "", "")
flags.StringVar(&cmdConfig.APISocket.Mode, "api-socket-mode", "", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think api-unix-* makes more sense here (see below for discussion).

Comment on lines +1100 to +1102
User string `hcl:"user"`
Mode string `hcl:"mode"`
Group string `hcl:"group"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Letting the administrator set user/group/mode is really interesting, in that I'd have expected us to use Nomad's own user/group and a restrictive mode. But this lets the administrator run a client as root and use a non-root user to talk to it (so long as you've got ACLs in place, I suppose).

This should definitely come with some notes about recommended usage in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some prior art:

  • Vault has the bare minimum docs here.
  • Consul is probably a good one to copy (except for the lack of Windows support... I wonder if it just-started-working for them when Go+Windows added support).
  • Boundary is clearly an iteration on Vault's approach, and I don't think has anything relevant for us. It's interesting they appear to enable tls by default while Vault totally lacks tls for their unix socket. I think we can follow Vault here until somebody gives us a reason to do otherwise.

command/agent/http.go Show resolved Hide resolved
fmt.Errorf("%s path must be %v characters or less", msgPrefix, MaxSocketPathLength))
}

sl, err := listenerutil.UnixSocketListener(config.APISocket.Path, config.APISocket.UnixSocketsConfig())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use this library we should make sure we've fully documented the expected behavior for users. You can set user by name or ID, but group only by ID (well, that's how it's documented but not how it behaves if you look at the source). If you set path without setting user, group, and mode, the socket is owned by the Nomad agent's user and the mode will be determined by the umask (which is usually different for login shells vs services, etc.)

Maybe we should have an explicit default configuration here rather than relying on the default from the library? i feel like that's likely to get changed out from under us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on observing the differences in implementation and documentation on that function. What are your thoughts on a reasonable default configuration, or are you saying that we should just codify the current defaults in Nomad so that if they do change in the library that we won't be surprised/broken by it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current defaults from the library seem reasonable, so let's codify those defaults in Nomad so that we control them going forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can ensure the behavior around user and group by making the defaults concrete, but the behavior around mode is a moving target since it's solely determined by how the library function handles that argument. I did add a test to validate whether or not the behavior of the library call has changed from we're expecting. Seem okay?

Comment on lines +1122 to +1133
if b.Path != "" {
result.Path = b.Path
}
if b.User != "" {
result.User = b.User
}
if b.Group != "" {
result.Group = b.Group
}
if b.Mode != "" {
result.Mode = b.Mode
}
Copy link
Member

@shoenig shoenig Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1119 to +1121
if s != nil {
result = *s
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just declared Copy() above, let's use that instead of implementing it twice

@tgross tgross added the stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows label May 17, 2024
@tgross
Copy link
Member

tgross commented Jul 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows theme/api HTTP API and SDK issues type/enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable HTTP to bind to a unix socket.
4 participants