-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instances' sled assignments won't change if an instance is stopped and restarted #2315
Labels
nexus
Related to nexus
Comments
I suspect fixing this problem will more or less require instance start to become a saga, because once it's done, starting an instance will require a lot of attendant work (reserving space on a sled, setting up V2P mappings) that we need to be able to retry if interrupted and that may need to be undone if the entire attempt to start the instance fails. |
This was referenced Apr 12, 2023
13 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Nexus external API's "instance start" command passes through to
Nexus::instance_start_runtime
:omicron/nexus/src/external_api/http_entrypoints.rs
Line 3088 in 9d1bd55
omicron/nexus/src/app/instance.rs
Lines 358 to 377 in 9d1bd55
If I'm reading things right, this function selects the sled to which to send the runtime state update by looking at the instance record in CRDB without regard for the instance's current state:
omicron/nexus/src/app/instance.rs
Lines 608 to 619 in 9d1bd55
This seems like the right thing to do if the instance is already incarnated on a sled somewhere. But if the instance is stopped and doesn't exist on any sled, this will try to create the instance on the sled on which it most recently ran, which might not have capacity for it (even though some other sled might). This function should distinguish the "instance already incarnated" and "instance stopped" cases and select a new sled in the latter case.
The text was updated successfully, but these errors were encountered: