-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate SoftNPU as virtual hardware #2089
Conversation
Just dropping a note that I was able to successfully test this end-to-end with the CI builds brought in via the setup-virtual-hardware scripts. From zero to instances talking to the Internet. |
Looks like destroy virtual hardware needs a bit of love.
|
06f4354
to
6b27f76
Compare
4bd4926
to
9fa6e34
Compare
This is now working as expected for me. |
I sorted out the networking issues we were seeing today with softnpu in the following And pushed an Omicron commit you can cherry-pick into this branch with the updates. |
Noting that we'll need to update to the current Dendrite API as we are now a few commits behind. This is causing saga failure for me due to API mismatch. |
I've put together a pile of pull requests to fix softnpu/p4 range keys.
These updates collectively solve the issue I was seeing with proxy arp replying to a much larger range than assigned. I think they may also address the NAT range issues @internet-diglett has been observing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all a complete review. Just a few notes that came up during chat today, wanted to get them persisted.
let params = sagactx.saga_params::<Params>()?; | ||
let opctx = OpContext::for_saga_action(&sagactx, ¶ms.serialized_authn); | ||
let osagactx = sagactx.user_data(); | ||
let dpd_client: Arc<dpd_client::Client> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to clone, I think, the dpd-client
methods we're calling take &self
.
.await | ||
} | ||
} | ||
.map_err(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the fact that each request can fail means these need to be separate saga actions. Look at the NIC and disk-creation for examples. We basically create N nodes, for N NICs. Each of those creates (or destroys) exactly one item. Otherwise, the action is not really idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I'm not certain this is the case. If any of the requests fail, the saga is supposed to unwind, right? Even if the node is re-attempted, any requests that were successful would not be re-sent because of the logic above this that checks for existing nat entries, so unless I'm missing something, I think this is actually idempotent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persisting our chat discussion here:
I had a misunderstanding of Steno's behavior. Since we don't run undo
actions for failed nodes, the logic in sic_remove_network_config()
would never be called in the event of a partial failure, which is why the overall behavior would not be idempotent.
} | ||
} | ||
|
||
for entry in entries_to_remove { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here about idempotency. We need a different forward and undo action for each external IP address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need a separate saga node for each ip, or do we need to make sure that we attempt to delete the nat entry for every existing ip at least once?
Implicitly handled based on above discussion
|
||
let mut entries_to_remove: Vec<ExternalIp> = vec![]; | ||
|
||
for entry in external_ips { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here about idempotency. We need a different saga action for each external IP I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persisting convo from chat:
It doesn't seem that any of the other saga nodes in instance_delete.rs
are using subsagas, so here we're going to ensure that we have attempted to nat entries for all external ips before checking errors.
e5563dc
to
b5eea6c
Compare
Update softnpu & dendrite dependencies Update commands in helper scripts to use `swadm` Update sled-agent config.toml to use softnpu override feature Update a-to-z docs to reflect new process
127635a
to
e013651
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tons of great work @internet-diglett! This has been a long slog, your persistence is admirable. I've got a few questions and small comments, but this generally seems pretty solid to me!
sled-hardware/src/illumos/mod.rs
Outdated
// TODO: correctness | ||
// I'm not sure whether or not we should be treating softnpu | ||
// as a stub or treating it as a real ASIC here, so this | ||
// might change. | ||
ScrimletMode::Softnpu => HardwareView::new_stub_tofino(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refers to how we look for the driver and devices files. This can't be a "real" Tofino, though we may want to add a variant to TofinoView
for the SoftNPU-specific stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Thanks for taking this work on, really appreciate you doing it. Herculean effort, which rockets the effectiveness of the product forward.
- I have a smattering of comments about minor changes before we check in, but when in doubt, pretend I added a "do what he said" comment under everything @bnaecker said.
- Adding a stub implementation of dpd so
cargo test
can work without custom environment variables should be a pretty high priority for us to fix -- lmk if I can help get this work done!
sled-agent/src/config.rs
Outdated
/// TODO: @internet-diglett | ||
/// this is to preserve the old behavior of `scrimlet_override = false`, | ||
/// but I haven't found where that logic has actually been leveraged... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old case, this was:
stub_scrimlet = Some(true) -> Please treat the sled as a scrimlet
stub_scrimlet = Some(false) -> Please DO NOT treat the sled as a scrimlet
stub_scrimlet = None -> Use whatever you automatically detect by scanning hardware
Basically, without this option, all "not-on-a-real-rack" deployments of Omicron look like Scrimlets.
Not sure if this case actually matters all too much, but that was the intent, at least.
.cargo/config
Outdated
# We need to enable this to access private repos | ||
[net] | ||
git-fetch-with-cli = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echoing @bnaecker 's comment here: https://github.com/oxidecomputer/omicron/pull/2089/files#r1143596296 , is this still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Removing it and seeing what happens in CI.
|
||
debug!(log, "checking for existing nat mapping for {target_ip:#?}"); | ||
|
||
// TODO: currently if we have this environment variable set, we want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we file an issue to track this, and label these comments with TODO(issue number)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
---- | ||
SKIP_ASIC_CONFIG=1 cargo test | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why this environment variable was added, and I empathize with the tradeoff being made here, but IMO the disposition is hostile to developers.
The property that "you should expect cargo build
and cargo test
to work" is pretty important - for example, a lot of effort was expended by @davepacheco to ensure that CockroachDB can be automatically started and used by our test suite, without requiring some environment variable setup.
You have comments down below with "TODOs" that we should be using a stub dpd, depending on how the platform was built, and I totally agree with that - we have support in our integration tests for launching an out-of-band CockroachDB for database testing, and we could totally do something similar for the stub dpd - it should be easy to have an in-Omicron implementation that can act as a fake.
That doesn't need to block this PR, fwiw. LMK if you want to pair on getting that stub in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on all points. I would like to work on getting that done ASAP.
/// Configuration for the dataplane daemon. | ||
#[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] | ||
pub struct DpdConfig { | ||
pub address: SocketAddr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a real system, where should this address come from? Is it on the underlay network? Would we expect it to have an entry in the internal DNS system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a real system there will be a dpd
instance per scrimlet, each with a unique underlay address. DpdConfig
is currently a single-valued member dpd_api
in PackageConfig
. This works for single-sidecar setups, but we'll need to evolve toward being able to handle multiple dpd
instances.
I've noted this in the external networking plumbing I'm currently working on. I'd be happy to take this issue on in that work if we think that makes sense. There is additional infrastructure in the database in that branch that helps to identify the sidecar of interest through rack_id
and switch_location
when operations come through the API that require dpd
interaction.
docs/boundary-services-a-to-z.adoc
Outdated
|
||
Follow the | ||
https://github.com/oxidecomputer/meta/blob/master/engineering/remote-access-preview-demo-setup.adoc#setting-up-the-cli[RAP document] | ||
to set up IPs, images, disks, instances etc. Things to pay particular attention | ||
to here are the following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal document; IMO we should not be referencing it in public-facing docs (not necessarily for secrecy, but just because it won't make sense to anyone working with this codebase who isn't an Oxide employee).
// TODO-remove | ||
// | ||
// See https://github.com/oxidecomputer/omicron/issues/1337 | ||
// | ||
// An additional part of the workaround to connect into instances. This is | ||
// required to tell OPTE to actually act as a 1-1 NAT when an instance is | ||
// provided with an external IP address, rather than do its normal job of | ||
// encapsulating the traffic onto the underlay (such as for delivery to | ||
// boundary services). | ||
use_external_ip_workaround(&log, &xde_conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hell yes, love to see this being deleted
tools/quickstart.sh
Outdated
@@ -0,0 +1,87 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda opposed to checking this script in - it's very environment-specific, may not make sense for all devs, but has a name which implies "you should run me first!"
I think this sorta thing makes sense for developers to have / use, but it's a bit of a landmine for someone to run without a lot of context about their environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense for it to be something like example.sh
instead of quickstart.sh
? If not, I'm not opposed to removing it altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an alternative would be to include the script or substantial portions of it in the documentation, such as how-to-run.adoc
.
docs/boundary-services-a-to-z.adoc
Outdated
Once your host has been populated with images, you can use the script at | ||
`tools/quickstart.sh` to quickly create a VM and set up the `proxy-arp`. Please | ||
be sure to edit the `ip_pool_start` and `ip_pool_end` variables to match your | ||
desired address ranges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below about this script; IMO we should not check it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all my questions, @internet-diglett! This all looks OK to me!
softnpu
zonesoftnpu
softnpu
Configure link addresses and gateway parameters viascadm
/softnpuadm
uponsoftnpu
startupdendrite
andsoftnpu
softnpu
socketdendrite
withsoftnpu
featuredendrite
asic type isSoftnpu
, havedendrite
configure switch viasoftnpu
socketfiledendrite
upon guest VM creationdendrite-softnpu
for programming nat entriesCloses #1465
Closes oxidecomputer/opte#236