[sled-agent] Store service configuration data in duplicate in M.2s #2972

smklein · 2023-05-01T16:17:48Z

Creates a Ledger structure which makes it easy to write toml-serializable data to and from M.2s.
Uses this Ledger structure to store all service configuration information in duplicate on the M.2s.

…zones, zone_name -> zone_type, config -> ledger

## Before this PR Running on rack2 and calling `omicron-package uninstall` would involve a fatal termination of the connection, as it would delete the `cxgbe0/ll` and `cxgbe1/ll` IP addresses necessary for contacting the sled. ## After this PR Those addresses are left alone. This is pretty useful for development, as it allows us to run `uninstall` to cleanly wipe a Gimlet, preparing it for future "clean installs".

smklein · 2023-05-01T17:30:22Z

FWIW, I've tested this on BRM42220026 in rack2, and the configs aren't in /var/oxide anymore.

I am seeing them in /pool/int/0d8f680f-0907-4170-822b-5c49d43a7660/config/ and /pool/int/2acb2cf2-ed09-4009-b4ce-3b651552e166/config/ now.

andrewjstone

@smklein Nice work! I like the approach taken very much.

Unfortunately, I have one somewhat major concern. I'm not sure how practical of a concern it is as it depends on hardware behavior. There is the possibility of data loss for a ledger in the following scenario:

Generation 1 is written to both M.2s( A and B)
Generation 2 is written only to A - B fails to write
Sled-agent reboots
Ledger::new reads from B, but fails to read from A
The ledger now points, incorrectly, to Generation 1.

At this point we have data loss, but things can get even more confusing, depending upon the failure modes of the M.2s. Let's say we continue with the following steps:

Generation 2 (with different data) is written to A.
Sled-agent reboots
Ledger::new reads both versions at generation 2 and picks A given its current logic as long as A is first in the path list:

// Return the ledger with the highest generation number.
        let ledger = ledgers.into_iter().reduce(|prior, ledger| {
            if ledger.is_newer_than(&prior) {
                ledger
            } else {
                prior
            }
        });

This problem is baked into the fact that you can't do consensus with only 2 nodes if those nodes can fail in arbitrary ways. Whether the M.2s can fail in arbitrary ways is unknown to me, but I'd really like to preclude the possibility of doing the wrong thing without relying on hardware behavior if at all possible. I can see two possible ways of going about resolving this issue:

If we ever fail to read or write from an M.2 we refuse to ever bring that M.2 back online. In short we tolerate the failure by making it permanent.
We do not bump ledger generation numbers in sled-agent, but instead bump them in Nexus, along with saving the latest ledger configuration in Nexus. We then would know if we read back stale data, which we could go ahead and rewrite based on what was put in CockroachDB via Nexus.

It's unclear to me if either of these is actually feasible, as presumably the zones must come up in order to be able to talk to Nexus in the first place. However, maybe updates can go to nexus as in option 2.

CC @rmustacc

sled-agent/src/ledger.rs

smklein · 2023-05-01T22:31:37Z

This problem is baked into the fact that you can't do consensus with only 2 nodes if those nodes can fail in arbitrary ways. Whether the M.2s can fail in arbitrary ways is unknown to me, but I'd really like to preclude the possibility of doing the wrong thing without relying on hardware behavior if at all possible. I can see two possible ways of going about resolving this issue:

So, first of all, I totally agree with you about this mismatch being possible. In a longer-term plan, I would like for Nexus to be able to send the request for services / datasets to the sled as:

"Here are all the services you should run, with a generation number"

(I've updated #732 to include this implementation detail)

Such an API means that this ledger becomes a cache of data that's stored in CRDB, and which can get updated when Nexus comes online.

We do not bump ledger generation numbers in sled-agent, but instead bump them in Nexus, along with saving the latest ledger configuration in Nexus. We then would know if we read back stale data, which we could go ahead and rewrite based on what was put in CockroachDB via Nexus.

I 100% think this is feasible, and is something we should do. It's easier for "non-dataset services" than "dataset services" due to the existing shape of the internal API, but I think both can be vectorized + generation'd.

I've created #2977 and #2978 -- sub-issues of #732 -- for us to track.

andrewjstone · 2023-05-01T22:48:11Z

I 100% think this is feasible, and is something we should do. It's easier for "non-dataset services" than "dataset services" due to the existing shape of the internal API, but I think both can be vectorized + generation'd.

I've created #2977 and #2978 -- sub-issues of #732 -- for us to track.

Awesome! Thank you!

andrewjstone

Based on the understanding around Nexus issuing updates for services when online and the Ledger acting as a cache I think we should go ahead and merge this in once the test bug is fixed.

smklein added 15 commits April 28, 2023 09:54

[sled-agent] Make service_manager responsible for storage services too

7b54128

Merge branch 'main' into storage-manager-cleanup

f410325

CRDB auto-format on boot

0b4b040

better use of 'unique_name' (for storage zones), auto-launch storage …

ef9517c

…zones, zone_name -> zone_type, config -> ledger

Merge branch 'main' into storage-manager-cleanup

f1fd1f5

Merge branch 'main' into storage-manager-cleanup

3a9ad87

[RSS] Explicit set of Bootstrap Agents

ec3b1e4

Merge branch 'storage-manager-cleanup' into rss-explicit

c812609

Fix tests

9d00c93

Merge branch 'storage-manager-cleanup' into rss-explicit

8a08090

make serialization happier

cfb7cbc

Store service ledgers in duplicate in M.2s

2ab628b

Improve parsing for toml, openapi

c37e57e

Merge branch 'rss-explicit' into service-ledger

4cbde16

smklein mentioned this pull request May 1, 2023

[sled-agent] Refactor service management out of StorageManager #2946

Merged

smklein added 3 commits May 1, 2023 12:29

Remove the comments about the ledger, we do that in #2972

541f68d

configs -> ledgers

5d59951

review feedback

ed20fff

smklein requested a review from andrewjstone May 1, 2023 17:00

smklein added 3 commits May 1, 2023 14:22

Merge branch 'main' into storage-manager-cleanup

ba2ba2c

Merge branch 'storage-manager-cleanup' into rss-explicit

afc03bd

Merge branch 'rss-explicit' into service-ledger

fbae358

andrewjstone reviewed May 1, 2023

View reviewed changes

sled-agent/src/ledger.rs Show resolved Hide resolved

sled-agent/src/ledger.rs Show resolved Hide resolved

sled-agent/src/ledger.rs Outdated Show resolved Hide resolved

We should allow synthetic disks to be used as M2s

0ae5531

smklein mentioned this pull request May 1, 2023

[sled-agent] Allow synthetic disks to be used as M.2s and U.2s. #2976

Merged

This was referenced May 1, 2023

Nexus should pass request for "all services on a sled" as a single endpoint with a generation number #2977

Closed

Nexus should pass request for "all datasets on a sled" as a single endpoint with a generation number #2978

Closed

andrewjstone approved these changes May 1, 2023

View reviewed changes

smklein added 9 commits May 2, 2023 11:04

Merge branch 'main' into storage-manager-cleanup

c970c4f

Merge branch 'storage-manager-cleanup' into rss-explicit

c00e2db

Merge branch 'main' into storage-manager-cleanup

599f669

Merge branch 'storage-manager-cleanup' into rss-explicit

e2b1a5b

Merge branch 'rss-explicit' into service-ledger

96c8fb4

Merge branch 'main' into storage-manager-cleanup

2487972

Merge branch 'storage-manager-cleanup' into rss-explicit

56a73e3

Merge branch 'rss-explicit' into service-ledger

488759e

Fix indexing

ef6756d

Base automatically changed from rss-explicit to main May 2, 2023 21:58

smklein merged commit a477ac2 into main May 2, 2023

smklein deleted the service-ledger branch May 2, 2023 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sled-agent] Store service configuration data in duplicate in M.2s #2972

[sled-agent] Store service configuration data in duplicate in M.2s #2972

smklein commented May 1, 2023

smklein commented May 1, 2023

andrewjstone left a comment

smklein commented May 1, 2023

andrewjstone commented May 1, 2023

andrewjstone left a comment

[sled-agent] Store service configuration data in duplicate in M.2s #2972

[sled-agent] Store service configuration data in duplicate in M.2s #2972

Conversation

smklein commented May 1, 2023

smklein commented May 1, 2023

andrewjstone left a comment

Choose a reason for hiding this comment

smklein commented May 1, 2023

andrewjstone commented May 1, 2023

andrewjstone left a comment

Choose a reason for hiding this comment