-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled-agent] Monitor for Tofino driver as factor in 'are we scrimlet' decision #1918
Conversation
sled-agent/src/sled_agent.rs
Outdated
|
||
// Scan the existing system for noteworthy events | ||
// that may have happened before we started monitoring | ||
if self.inner.hardware.is_scrimlet() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding an attached tofino driver tells us that we're running on a scrimlet, but it doesn't mean we have all the resources needed to launch a switch zone. Specifically, we need the tfpkt0
link to exist. Without that, tfportd
can't create tfport
s, so mgs
and maghemite can't run. In practice, I would expect that link to exist long before sled-agent
runs, but it's at least a theoretical concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this in #1933 , when actually launching the switch zone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sean this looks good! Mostly minor comments from me. Based on @davepacheco's comments and my rethinking about the nexus queue in #1917, I'm not sure we want to stick with that strategy. However, I'm fine merging this as is so we can start using it and revisit that strategy in follow ups with whatever we decide upon.
/// as illumos systems. | ||
pub struct Hardware { | ||
log: Logger, | ||
inner: Mutex<HardwareInner>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Mutex actually needed? It looks like scrimlet config is only set during initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admittedly, this isn't really needed. I was trying to emulate the OPTE usage here: https://github.com/oxidecomputer/omicron/tree/main/sled-agent/src/opte
... But in reality, the Sled Agent only support the "sim" version on non-illumos platforms. If we try running the "real" sled agent outside illumos, tons of stuff will break (access to zones, dladm, ipadm, etc).
I've decided to keep most of this module as stubs, so that editor support (like rust analyzer) will still work on platforms like Linux, but it'll hopefully be more obvious that this code should not be callable when we start approaching the "real illumos" interfaces. (done in 4863667)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for the cleanup and comments.
hardware
module to Sled Agent for minimally monitoringdevinfo
output. This will certainly evolve, but this PR includes a "bare minimum" real example of tracking the tofino driver.rss-config.toml
to decide if we're running on a scrimlet. Instead...force-scrimlet
option to make the sled agent assume that it is a scrimletFixes https://github.com/oxidecomputer/minimum-upgradable-product/issues/19
Part of #1917
Part of #823
Pre-requisite for https://github.com/oxidecomputer/minimum-upgradable-product/issues/16
Pre-requisite for https://github.com/oxidecomputer/minimum-upgradable-product/issues/18