-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes improves OPTE installation and improves errors #1052
Conversation
- Fixes mismerge in `tools/install_opte.sh` that prevented installing OPTE package on a system that didn't previously have it - Improves error messages when either xde driver fails or the expected virtual networking devices don't exist
Should resolve #1049 If we try to run the sled agent without the expected xde driver config file at
If we've not run
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the quick fixes!
Something actually occurred to me after posting this. This expands the set of VNICs returned by Any thoughts here @smklein? |
Fortunately, that's the only spot where we actually query for all sled-agent managed VNICs, so this change should be easy. The implementation (on main) is a little redundant - omicron/sled-agent/src/illumos/dladm.rs Lines 167 to 179 in c52656d
... but then we filter again in sled_agent.rs, after making that call: omicron/sled-agent/src/sled_agent.rs Lines 168 to 175 in c52656d
Before OPTE integration, "get_vnics" was fairly unambiguous - we only managed guest vnics. Now that we have more complex categories, you're right, it's probably worthwhile disambiguating. I'd be fine with either making a |
Cool, will add shortly. Thanks! |
- Adds a `VnicKind` enum for tracking the flavor of each VNIC the sled agent is responsible for. - Adds parameter to the `Dladm::get_vnics()` call which filters the returned list to a particular kind. The goal here is to be more explicit about which VNICs we're looking for and ultimately operating on in the sled agent. - The sled agent now cleans up guest VNICs and the underlying xde devices (OPTE ports) when it starts up, similar to how it clears out any extant control VNICs, to ensure things are in a reliable state before accepting any requests from Nexus - Improves the `tools/install_opte.sh` script, trying to be less intrusive and only modifying the state we need to change when adding the OPTE / xde package repositories.
@smklein This could use another once-over, if you don't mind. I've added an enum to represent the kind of each VNIC, and use that explicitly when clearing out either Oxide control VNICs or guest VNICs when the sled agent starts up. I'm also clearing out the OPTE ports / xde devices at startup, which should resolve #1048 as well. |
Ok, a few more straggler changes required. I opted to return an As a sidenote, here's a snippet from the sled agent logs, when restarting and cleaning up the old state:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mod my last (nit) comment, this still LGTM!
Propolis changes since the last update: Gripe when using non-raw block device Update zerocopy dependency nvme: Wire up GetFeatures command Make Viona more robust in the face of errors bump softnpu (#577) Modernize 16550 UART Crucible changes since the last update: Don't check ROP if the scrub is done (#1093) Allow crutest cli to be quiet on generic test (#1070) Offload write encryption (#1066) Simplify handling of BlockReq at program exit (#1085) Update Rust crate byte-unit to v5 (#1054) Remove unused fields in match statements, downstairs edition (#1084) Remove unused fields in match statements and consolidate (#1083) Add logger to Guest (#1082) Drive hash / decrypt tests from Upstairs::apply Wait to reconnect if auto_promote is false Change guest work id from u64 -> GuestWorkId remove BlockOp::Commit (#1072) Various clippy fixes (#1071) Don't panic if tasks are destroyed out of order Update Rust crate reedline to 0.27.1 (#1074) Update Rust crate async-trait to 0.1.75 (#1073) Buffer should destructure to Vec when single-referenced Don't fail to make unencrypted regions (#1067) Fix shadowing in downstairs (#1063) Single-task refactoring (#1058) Update Rust crate tokio to 1.35 (#1052) Update Rust crate openapiv3 to 2.0.0 (#1050) Update Rust crate libc to 0.2.151 (#1049) Update Rust crate rusqlite to 0.30 (#1035)
Propolis changes since the last update: Gripe when using non-raw block device Update zerocopy dependency nvme: Wire up GetFeatures command Make Viona more robust in the face of errors bump softnpu (#577) Modernize 16550 UART Crucible changes since the last update: Don't check ROP if the scrub is done (#1093) Allow crutest cli to be quiet on generic test (#1070) Offload write encryption (#1066) Simplify handling of BlockReq at program exit (#1085) Update Rust crate byte-unit to v5 (#1054) Remove unused fields in match statements, downstairs edition (#1084) Remove unused fields in match statements and consolidate (#1083) Add logger to Guest (#1082) Drive hash / decrypt tests from Upstairs::apply Wait to reconnect if auto_promote is false Change guest work id from u64 -> GuestWorkId remove BlockOp::Commit (#1072) Various clippy fixes (#1071) Don't panic if tasks are destroyed out of order Update Rust crate reedline to 0.27.1 (#1074) Update Rust crate async-trait to 0.1.75 (#1073) Buffer should destructure to Vec when single-referenced Don't fail to make unencrypted regions (#1067) Fix shadowing in downstairs (#1063) Single-task refactoring (#1058) Update Rust crate tokio to 1.35 (#1052) Update Rust crate openapiv3 to 2.0.0 (#1050) Update Rust crate libc to 0.2.151 (#1049) Update Rust crate rusqlite to 0.30 (#1035) --------- Co-authored-by: Alan Hanson <[email protected]>
tools/install_opte.sh
that prevented installingOPTE package on a system that didn't previously have it
virtual networking devices don't exist