Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Monitor for Tofino driver as factor in 'are we scrimlet' decision #1918

Merged
merged 5 commits into from
Nov 10, 2022

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Nov 4, 2022

  • Adds hardware module to Sled Agent for minimally monitoring devinfo output. This will certainly evolve, but this PR includes a "bare minimum" real example of tracking the tofino driver.
  • Stop relying on rss-config.toml to decide if we're running on a scrimlet. Instead...
    • (General case) Rely on monitoring hardware
    • (Testing, manual case) Provide a force-scrimlet option to make the sled agent assume that it is a scrimlet

Fixes https://github.com/oxidecomputer/minimum-upgradable-product/issues/19
Part of #1917
Part of #823
Pre-requisite for https://github.com/oxidecomputer/minimum-upgradable-product/issues/16
Pre-requisite for https://github.com/oxidecomputer/minimum-upgradable-product/issues/18

@smklein smklein added the Sled Agent Related to the Per-Sled Configuration and Management label Nov 4, 2022
sled-agent/src/hardware/illumos/mod.rs Outdated Show resolved Hide resolved

// Scan the existing system for noteworthy events
// that may have happened before we started monitoring
if self.inner.hardware.is_scrimlet() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding an attached tofino driver tells us that we're running on a scrimlet, but it doesn't mean we have all the resources needed to launch a switch zone. Specifically, we need the tfpkt0 link to exist. Without that, tfportd can't create tfports, so mgs and maghemite can't run. In practice, I would expect that link to exist long before sled-agent runs, but it's at least a theoretical concern.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in #1933 , when actually launching the switch zone

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sean this looks good! Mostly minor comments from me. Based on @davepacheco's comments and my rethinking about the nexus queue in #1917, I'm not sure we want to stick with that strategy. However, I'm fine merging this as is so we can start using it and revisit that strategy in follow ups with whatever we decide upon.

sled-agent/src/bootstrap/agent.rs Show resolved Hide resolved
sled-agent/src/hardware/illumos/mod.rs Show resolved Hide resolved
sled-agent/src/hardware/illumos/mod.rs Show resolved Hide resolved
sled-agent/src/hardware/illumos/mod.rs Outdated Show resolved Hide resolved
sled-agent/src/hardware/illumos/mod.rs Show resolved Hide resolved
sled-agent/src/hardware/illumos/mod.rs Show resolved Hide resolved
/// as illumos systems.
pub struct Hardware {
log: Logger,
inner: Mutex<HardwareInner>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Mutex actually needed? It looks like scrimlet config is only set during initialization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, this isn't really needed. I was trying to emulate the OPTE usage here: https://github.com/oxidecomputer/omicron/tree/main/sled-agent/src/opte

... But in reality, the Sled Agent only support the "sim" version on non-illumos platforms. If we try running the "real" sled agent outside illumos, tons of stuff will break (access to zones, dladm, ipadm, etc).

I've decided to keep most of this module as stubs, so that editor support (like rust analyzer) will still work on platforms like Linux, but it'll hopefully be more obvious that this code should not be callable when we start approaching the "real illumos" interfaces. (done in 4863667)

sled-agent/src/nexus.rs Show resolved Hide resolved
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the cleanup and comments.

@smklein smklein merged commit 6fa27a3 into main Nov 10, 2022
@smklein smklein deleted the scrimlet-autodetect branch November 10, 2022 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sled Agent Related to the Per-Sled Configuration and Management
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants