-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
networking: option to ignore static port collisions #23956
Conversation
There are a couple of advantages:
|
nomad/structs/structs.go
Outdated
@@ -7264,6 +7273,12 @@ func (tg *TaskGroup) validateNetworks() error { | |||
err := fmt.Errorf("Port %q cannot be mapped to a port (%d) greater than %d", port.Label, port.To, math.MaxUint16) | |||
mErr.Errors = append(mErr.Errors, err) | |||
} | |||
|
|||
// TODO: is "" always "host" ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly it looks like we don't canonicalize NetworkResource.Mode
, but it's the default for the zero value.
nomad/structs/network.go
Outdated
if !port.AllowReuse { | ||
collide = true | ||
reason := fmt.Sprintf("port %d already in use", port.Value) | ||
reasons = append(reasons, reason) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to call used.Set
here for when we're ranking a node and adding that node's existing allocations to the NetworkIndex
. If those allocations have port.allow_reuse
we want them to collide with the new allocation if that allocation doesn't also have port.allow_reuse
for that port.
(This will probably pop out once you've got test coverage in the scheduler code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only get into this condition if used.Check()
(line 438 just above here), i.e. if used.Set()
(line 447) has already been called by some other prior entry here.
the full context: https://github.com/hashicorp/nomad/blob/126abd84e5c6f069b897aa09e209a385227475d4/nomad/structs/network.go#L437-L448
(those "todo: yes. #." comments are the order in which I encountered collisions while stepping through breakpoints.)
In other words, Set()
always happens if it hasn't already, regardless of AllowReuse
, then subsequent Check()
s just may or may not report collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -105,6 +105,12 @@ All other operating systems use the `host` networking mode. | |||
- `host_network` `(string:nil)` - Designates the host network name to use when allocating | |||
the port. When port mapping the host port will only forward traffic to the matched host | |||
network address. | |||
- `ignore_collision` `(bool: false)` - Allows the group to be placed on a node | |||
where the port may already be reserved. Intended for programs that support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use the word "reserved" here? There's a specific client configuration option for "reserved ports" that may be confusing. Maybe "allocated" or even just "in use" (maybe the port isn't being used by a Nomad job at all)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually imagine that this may be used in conjunction with the reserved_ports
client config, so a cluster operator may dedicate ports for applications like these, and run them with Nomad instead of systemd or such.
On the flip side, it also allows job authors to bypass operator intent... I suppose Sentinel may need to come into play here to retain current guarantees. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought some more about bypassing operator intent around reserved_ports
: a job author can already do that today, if they merely do not specify a port{}
, so this does not make that situation any worse. (I have had to remind myself of this more than once.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point!
so more than one copy of a program can run at a time on the same port with SO_REUSEPORT. requires host network mode, and some drivers (like docker) may also need config { network_mode = "host" } but this is not validated prior to placement.
6441cff
to
6e07d84
Compare
tl;dr - Add an option to ignore port collisions for
static
ports during scheduling, if thegroup.network.mode
is"host"
(the default). This allows more than one copy of a program that usesSO_REUSEPORT
to bind to the same port on a single node.Note: some task drivers (like docker) may require additional config to set
network_mode = "host"
, or you may encounter runtime issues after placement.Closes #15487
Problem
Currently, static ports may be used only once on a single address on a network interface.
This is good, almost all of the time. Most programs bind exclusively to an addr:port, so other copies of the same program (or other programs) cannot multi-bind to it. The Nomad scheduler prevents placing allocations that would cause such port collisions to avoid a predictable program error at runtime.
Some programs, however, do allow addr:port reuse, via the SO_REUSEPORT socket option (on linux). For example, this little toy web server:
reuseport.py
A more practical example is traefik, which also (optionally) supports it, and while I haven't taken a full inventory, I suspect other proxies do, too.
Nomad's
network.port
prevents this use case. To work around it, one may merely neglect to specify theport
, and then still bind to it anyway, but this completely removes the guardrails that prevent other allocs from being mistakenly placed with no hope of success.Solution
This PR adds a new
ignore_collision
field tojob.group.network.port
(only this scope, i.e. not task networks):full example: reuseport.nomad.hcl
(uses
reuseport.py
from above)A job run like that can have multiple copies placed on the same host, then another job that requests the same static port without the new option will be placed elsewhere (or block pending the port becoming available).
This just gets the scheduler out of the way. It's up to the user (job author and/or cluster operator) to ensure placement on a node that does not already have some other thing listening on the port.
For example, I can imagine a client config like this:
reserved_ports
would prevent placement of any group that does not allow port reuse, ahead of the desired job, thennode_pool
(or some other constraint method) ensures the desired job gets placed on such a node.I do wonder, though, if
port.allow_reuse
actually adds anything beyond merely excluding the port, ifreserved_ports
already prevents collisions from other jobs... Perhaps asystem
job that runs everywhere, all the time, so could grab the port without needing to reserve it in client config? I am a little hazy on the real value-add here.Security-wise, this does not increase risk of some other program trying to squat on ports any more than a host-network'd job currently can, if it simply does not declare that it will use a
port
at all.Questions:
Instead ofallow_reuse
, perhapsignore_collisions
is more precise? (and feels like a good warning)