Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image: include dumpadm.conf that disables running savecore on boot #133

Merged
merged 1 commit into from
Dec 8, 2023

Conversation

citrus-it
Copy link
Contributor

There are a couple of pieces here, but we generally don't want to have savecore run automatically on a gimlet, and the control plane immediately configures things so that we don't before selecting a dump device and a place to extract any saved dump, and then running savecore itself.

However, once https://github.com/oxidecomputer/stlouis/issues/378 is integrated, the system can boot up with a dump device configured. In this case, the dumpadm service attempts to save any panic in that device to the ramdisk with predictable results.

savecore: System dump time: Sun Dec 28 00:11:06 1986

savecore: not enough space in /var/crash/gimlet-sn06 (3659 MB avail, 11665 MB needed)

Shipping this partial dumpadm.conf prevents this, and the rest of the content is filled in at boot by the dumpadm service to reflect the system state.

savecore: System dump time: Sun Dec 28 00:11:06 1986

savecore: Panic crashdump pending on dump device but dumpadm -n in effect; run savecore(8) manually to extract.
gimlet-sn06 # cat /etc/dumpadm.conf
#
# dumpadm.conf
#
# Configuration parameters for system crash dump.
# Do NOT edit this file by hand -- use dumpadm(8) instead.
#
DUMPADM_DEVICE=/devices/pci@38,0/pci1de,fff9@3,3/pci1344,3100@0/blkdev@w00A0750130083A19,0:e
DUMPADM_SAVDIR=/var/crash/gimlet-sn06
DUMPADM_CONTENT=curproc
DUMPADM_ENABLE=no
DUMPADM_CSAVE=on

@citrus-it citrus-it requested review from jclulow and wesolows December 6, 2023 12:39
Copy link
Contributor

@wesolows wesolows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all right, but I haven't been able to find all the answers I was hoping for in the control plane. If sled-agent starts up and its understanding of the dump device it wants to use doesn't get changed from what's hardcoded today, and sled-agent actually works, both the resulting dump configuration and savecore look like they should work. There are a lot of problems with this code: the stuff at https://github.com/oxidecomputer/omicron/blob/001143cf3b686b8cb4a72916e9f775a0a186df9b/illumos-utils/src/dumpadm.rs#L43-L48 is certainly Private and even if it is not, it certainly should never be replicated out of sys/dumphdr.h. Then there is https://github.com/oxidecomputer/omicron/blob/001143cf3b686b8cb4a72916e9f775a0a186df9b/illumos-utils/src/dumpadm.rs#L142-L150 which suggests both that a dumpadm RFE might be in order and that there is a path in which there is no savecore location and if we end up in that situation the dump is going to be lost. But in general, it seems like this should work in the normal case.

What I have not been able to figure out is how any of this works in the recovery image. Your addition here would (it appears to me) apply to that image as well, which is good. But I couldn't find the code that's responsible for doing dump device setup or running savecore (if it even makes sense to run savecore in that case?). So I haven't been able to completely convince myself that this works in every path, which I guess might also include the case in which we panic before sled-agent runs (or during initial setup, perhaps). The good news is that with this change, we should be able to write dumps into the dump device that would previously have been lost, thanks to the change for stlouis#378. The bad news is I don't understand how that's ever supposed to get out afterward, especially because the only supported action for a sled in a pre-sled-agent panic loop is to boot the recovery image and reset to the original state.

This may not be the place to mention any of this, but I don't know what is.

@jclulow
Copy link
Collaborator

jclulow commented Dec 6, 2023

Did you consider just disabling the dumpadm service in the SMF profile we use to construct the image?

@citrus-it
Copy link
Contributor Author

citrus-it commented Dec 7, 2023

Did you consider just disabling the dumpadm service in the SMF profile we use to construct the image?

Considered it, but fmd depends on it, and I prefer letting it do the right thing in this case. That includes, as it turns out, setting the system dump/instance UUID.

I think this is all right, but I haven't been able to find all the answers I was hoping for in the control plane.

It's worth noting that crash dump handling in omicron is currently being reworked and I'm talking to @lifning in case this and https://github.com/oxidecomputer/stlouis/issues/378 has any impact there.

There are a couple of surprising things in the sled agent code. I was expecting it to just set up a dump device on the M.2 that was booted from, and then run savecore once it had determined a good location to put the dump. Instead it does a dance trying to find an empty dump device on either M.2, and running savecore before configuring the dump device. I don't think most of this is necessary given the guarantees that dumpadm and savecore provide, and we could certainly extend those utilities if more is required.

What I have not been able to figure out is how any of this works in the recovery image. Your addition here would (it appears to me) apply to that image as well, which is good. But I couldn't find the code that's responsible for doing dump device setup or running savecore (if it even makes sense to run savecore in that case?).

stlouis#378 only currently applies to booting from disk, not the recover/mupdate path (IPCC) nor network boot (lab/manufacturing). It would be possible to search for any M.2 which has a slice 4 in the recovery case and do the same setup there, but it isn't in the current patch set.

@wesolows
Copy link
Contributor

wesolows commented Dec 7, 2023

stlouis#378 only currently applies to booting from disk, not the recover/mupdate path (IPCC) nor network boot (lab/manufacturing). It would be possible to search for any M.2 which has a slice 4 in the recovery case and do the same setup there, but it isn't in the current patch set.

Right, the case I was thinking about might look like this:

10 Select normal phase 2
20 Configure dump device
30 Start booting
40 Panic before arriving at userland
50 goto 10

After several iterations of that, the operator triggers recovery. The only other supported operation is to lower the sled's power state to A2 or A3, but let's assume that the same loop will resume the next time it's raised to A0 and eventually we're going to do recovery. When we get to the recovery image, we do ???. My thought is not about configuring early dump in that case (while we do in fact know exactly where to put it, we're still maintaining the fiction that sled-agent controls all the storage), it's about saving and recovering the dump we already have. That can happen "now", where the recovery process runs savecore (putting it where though?) and then PUTting it to some HTTP server, or it can happen "later", but only if we don't trash the contents in some way -- like by changing the SSD's format or the definition of slice 4's starting and ending LBAs.

There is a bigger architectural principle we might want to consider here, on account of the fact that sled-agent is basically orphaned, is going to be forever a huge maintenance and improvement burden on account of the need to write hundreds of bindings for all the system libraries it needs to use, and that sled-agent runs so late. Eventually sled-agent is either going to be rewritten in C or broken into pieces, some of which will be rewritten in C. There's no practical alternative, and we've now bound the kernel to the layout of the non-serviceable storage devices; if upgrade across such a change is even possible it's going to be nasty. For all these reasons, we might want to decide that the format of the non-serviceable storage devices is owned by the OS. That would also mean moving creation/update/removal of any disk labels on those devices out of sled-agent and into an OS-delivered service that would be delivered with, and coordinated with, the kernel. This doesn't have to be done now, but it's something that this work has definitely brought into focus for me and to think about. In the meantime, I'm still vaguely obsessed with not losing dumps, even though there's still no supported way for us to get them and I have no idea when there might be.

Copy link
Contributor

@wesolows wesolows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of things upstack here that I dislike greatly and/or don't understand, but from what I can tell this change along with the companion that has already landed on stlouis will give us strictly more opportunity to recover dumps. Thanks for taking care of this; it's a step forward!

@citrus-it citrus-it merged commit 511c10d into master Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants