Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests for DumpSetup #3788

Merged
merged 6 commits into from
Feb 1, 2024
Merged

Conversation

lifning
Copy link
Contributor

@lifning lifning commented Jul 28, 2023

verifies decision-making in different combinations of M.2/U.2 dataset and dump slice availability and occupancy & tests log/core-archiving. (functionality that had been implemented for #2478)

@lifning lifning force-pushed the debugdir-tests branch 7 times, most recently from bc0c1b6 to af38052 Compare July 29, 2023 20:01
@lifning lifning requested a review from smklein July 29, 2023 21:48
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in-line -- I think it's great to have tests, and I'm mostly on-board with this direction, but I'm wondering if we could use fakes instead of mocks to perform these tests.

sled-agent/src/storage/dump_setup.rs Outdated Show resolved Hide resolved
sled-agent/src/storage/dump_setup.rs Outdated Show resolved Hide resolved
sled-agent/src/storage/dump_setup.rs Outdated Show resolved Hide resolved
sled-agent/src/storage/dump_setup.rs Outdated Show resolved Hide resolved
sled-agent/src/storage/dump_setup.rs Outdated Show resolved Hide resolved
sled-agent/src/storage/dump_setup.rs Outdated Show resolved Hide resolved
@lifning lifning force-pushed the debugdir-tests branch 2 times, most recently from acfa80c to 97f4c85 Compare August 11, 2023 19:31
@lifning lifning changed the title mockall-backed unit tests for DumpSetup Unit tests for DumpSetup Aug 11, 2023
@lifning lifning force-pushed the debugdir-tests branch 3 times, most recently from fa62391 to 6910e78 Compare December 5, 2023 11:10
@morlandi7 morlandi7 linked an issue Jan 26, 2024 that may be closed by this pull request
7 tasks
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look great, thanks for adding these!

@@ -398,7 +398,7 @@ wicket-common = { path = "wicket-common" }
wicketd-client = { path = "clients/wicketd-client" }
zeroize = { version = "1.7.0", features = ["zeroize_derive", "std"] }
zip = { version = "0.6.6", default-features = false, features = ["deflate","bzip2"] }
zone = { version = "0.3", default-features = false, features = ["async"] }
zone = { version = "0.3", default-features = false, features = ["async", "sync"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do really need the sync feature here? If it's possible, it would be cool to keep the codebase async-only.

AFAICT, this is only used to call Zones::list_blocking, which seems like it could easily be replaced with:

https://github.com/oxidecomputer/zone/blob/65647e678fec739d4e9a6897bf2ee48e1fb051a5/zone/src/lib.rs#L1020-L1027

If we use async_trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zones::list_blocking is called a couple layers down the stack from DumpSetupWorker::archive_files, which is called within a MutexGuard - I was hoping to avoid questions of cancellation-safety issues this way.

sled-agent/src/dump_setup.rs Outdated Show resolved Hide resolved
sled-agent/src/dump_setup.rs Outdated Show resolved Hide resolved
)));
let worker_weak = Arc::downgrade(&worker);
let log_poll = log.new(o!("component" => "DumpSetup-archival"));
let _poller = std::thread::spawn(move || {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I know this has been moved, so, no need to block this PR, but might be worth considering "why a std::thread" vs "how about a tokio::task?"

We use a lot of tokio tasks for async work here, if poll_file_archival could make sense as an async operation, this might make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely open to reworking it to a tokio::task in a follow-up -- now that there are tests to make such a change less risky, and the conditions that led to using a std::thread here ~7 months ago are no longer a concern.

// function also invokes `savecore(8)` to save it into that directory.
// On success, returns Ok(Some(stdout)) if `savecore(8)` was invoked, or
// Ok(None) if it wasn't.
fn dumpadm(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda struggling to see what's different from before, but this looks pretty different from just "unit tests" + "refactoring for unit tests", right?

I don't recall seeing this "check for savecore directory, use /tmp/crash if it doesn't exist" behavior before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thank you!

Comment on lines +401 to +404
// if we don't have a savecore destination yet, still create and use
// a tmpfs path (rather than the default location under /var/crash,
// which is in the ramdisk pool), because dumpadm refuses to do what
// we ask otherwise.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp appears to be backed by swap -- I think that using /var/crash also has problems (writing a crash dump of the kernel to a ramdisk is not useful) but writing a kernel crash dump to swap also seems like it's basically throwing it away.

Is using this /tmp path better than not calling savecore at all?

Copy link
Contributor Author

@lifning lifning Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case where the /tmp path is used as a placeholder for the invocation to dumpadm, we don't actually invoke savecore. this is just there to give dumpadm a 'safer'1 placeholder -- we need it to tell the kernel to direct its core to a specific raw partition should it crash, but it also writes the config file that savecore reads. we only invoke savecore later in this if savecore_dir.is_some() -- that is, if we have a non-placeholder destination for it.

Footnotes

  1. for some value of safer. in theory, sled-agent's the only thing ever invoking savecore on the host, but if we're in a situation where kernel core dumps are part of the conversation, we may want an amount of defense-in-depth against entropy -- supposing for example we're in some situation where we may not be able to enumerate the external disks, but where an operator can manually invoke savecore and retrieve the compressed dump from /tmp through some other means.

sled-agent/src/dump_setup.rs Outdated Show resolved Hide resolved
@lifning lifning enabled auto-merge (squash) February 1, 2024 10:15
@lifning lifning merged commit 24b1500 into oxidecomputer:main Feb 1, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automatic debug data collection without running the system out of space
2 participants