-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numa: enable numa topology detection #18146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @shoenig. I really like how this API turned out. It feels like there are a lot of layers but when you start digging you can see exactly how each one has its own responsibilities (ex. splitting out setting up the cgroups vs writing the resources into them)
|
||
package lang | ||
|
||
// A Stack is a simple LIFO datastructure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note that it's not concurrency safe?
@@ -72,6 +72,7 @@ func (tr *TaskRunner) initHooks() { | |||
newStatsHook(tr, tr.clientConfig.StatsCollectionInterval, hookLogger), | |||
newDeviceHook(tr.devicemanager, hookLogger), | |||
newAPIHook(tr.shutdownCtx, tr.clientConfig.APIListenerRegistrar, hookLogger), | |||
newWranglerHook(tr.wranglers, task.Name, alloc.ID, hookLogger), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooks run in the order listed here. If we moved this earlier in the list (say right after the task dir hook or identity hook), would we be able to take advantage of having the cgroup setup for other child processes associated with the task (like the artifact
hook or logmon
hook or some future hypothetical sandboxed-template
hook)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea! Created #18208 to track it.
) | ||
|
||
const ( | ||
wranglerHookName = "wrangler" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nitpick, feel free to ignore: I like the whimsy of calling this wrangler
but it will show up in logs for this hook. Does this word carry unintended meaning outside of specific cultural context? (Dictionary suggests it's "argumentative", a quick search brings up mostly American brand names, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point - I can rename this to something like "procisolation"
for now, unless someone has something better
@@ -795,7 +794,7 @@ func DefaultConfig() *Config { | |||
CNIConfigDir: "/opt/cni/config", | |||
CNIInterfacePrefix: "eth", | |||
HostNetworks: map[string]*structs.ClientHostNetworkConfig{}, | |||
CgroupParent: cgutil.GetCgroupParent(""), | |||
CgroupParent: "nomad.slice", // SETH todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another TODO associated with this: the systemd service unit we package doesn't delegate management of our cgroup slice (ref https://systemd.io/CGROUP_DELEGATION/). We should probably add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great callout - filed #18211 to cover that
// restore process wrangler for task | ||
ar.wranglers.Setup(proclib.Task{AllocID: tr.Alloc().ID, Task: tr.Task().Name}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the task handle restore fails, we'll never hit this block, so it doesn't seem like this is here to handle that case. Should this belong in the tr.Restore
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the ar.wranglers
field belongs to the alloc runner which the individual task runner wouldn't have access to. Which, I think is technically unnecessary for the current implementation but in the long term it could also manage child processes associated with the alloc (and not any task).
@@ -1,688 +0,0 @@ | |||
// Code generated from hashicorp/nomad/tools/ec2info; DO NOT EDIT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
func (hz KHz) MHz() MHz { | ||
return MHz(hz / 1000) | ||
} | ||
|
||
func (hz KHz) String() string { | ||
return strconv.FormatUint(uint64(hz.MHz()), 10) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this pointer receiver hz
feels a little gross 😀
func (hz KHz) MHz() MHz { | |
return MHz(hz / 1000) | |
} | |
func (hz KHz) String() string { | |
return strconv.FormatUint(uint64(hz.MHz()), 10) | |
} | |
func (khz KHz) MHz() MHz { | |
return MHz(khz / 1000) | |
} | |
func (khz KHz) String() string { | |
return strconv.FormatUint(uint64(khz.MHz()), 10) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops! 😱
} | ||
|
||
func (f *CPUFingerprint) setReservableCores(request *FingerprintRequest, response *FingerprintResponse) { | ||
reservable := request.Config.ReservableCores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we've dropped handling of user-provided configuration here, but we specifically refer to it in the new numa package. Should this still be part of the fingerprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #18213 - will think about this more
func (m *MacOS) scanLegacyX86(top *Topology) { | ||
coreCount, _ := unix.SysctlUint32("machdep.cpu.core_count") | ||
hz, _ := unix.SysctlUint64("hw.cpufrequency") | ||
coreSpeed := KHz(hz / 1_000_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This smells funny... what's the unit of hw.cpufrequency
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #18212 for myself to followup
Reviewers: see NMD-183 (internal).
This change looks larger than it is, just due to the amount of deletions and fallout in plumbing things through. It's crudely broken up into 9 commits (by filepath):
b17d418 - refactor how the client handles cgroups (moving responsibility from the client cpuset manager + task drivers into mostly just task hooks).
2121b44 - fingerprint numa topology (and eliminate aws cpu fingerprinter)
199dcb2 - plumbing numa and cgroups changes into drivers
1cd5264 - cleaning up task resource usage accounting
200a275 - plumbing for client config
33b958c - a small stack implementation (used by the refactoring around pid tracking / process resource utilization)
6d7401e - remove the ec2info tool
44c33ee - fixup testing packages in plugins/drivers
1ea6efc - build step changes
Documentation and probably some e2e testing will be in a followup PR.
Not sure what's going on with the
test-e2e
workflow (vault compat matrix) but I'll look into it.