-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sled agent hung on disabling fmd
service.
#4252
Comments
Prior to this, one of my
I "fixed" this with
and then reran the destroy virtual hardware script which ran to completion without error. |
cc @citrus-it who introduced this code in #4201 Another thing to note is that @rcgoodfellow had seen this issue several times today, except that each time it resolved itself after about 15 minutes, which wasn't the case here. Ry pointed out that the log messages being spewed from sled-agent were actually 20 minutes in the future based on running One crazy hypothesis I had was that the call to |
Noting that the
from the post above seems to be a recurring issue now when destroying virtual hardware. Running |
I think this delay might actually be zpool import. I can see these as sled agent is getting going
and it goes through them one at a time, each taking a few minutes it seems |
FWIW, I don't think we should be dorking with fmd on a workstation. It makes sense on a Gimlet where the system boots from a ramdisk, but on a workstation we're almost certainly going to ruin things (like you see here) and also we're going to discard some of your actual FMA data each time you destroy and recreate the (virtual) pools with those files in them. |
I'm now also running into this, but @rcgoodfellow's workaround isn't doing the trick. FMD is not in maintenance mode and I'm getting:
|
I ended up resolving this by deleting the swap
|
This may be related #3651 |
Spinning up an omicron development environment on my workstation hung at the following place. Sled agent was running, but not making any progress. It had brought up the switch zone, but had not brought up any other zones.
omicron/sled-agent/src/backing_fs.rs
Lines 157 to 161 in 194889b
We were able to see this on the host OS as follows.
Looking at the
fmd
service, we found it in maintenance mode.Clearing the service with
svcadm clear fmd
resulted in sled agent springing to life. At this point, it had been stuck for over an hour as I went to work on other things. The control plane came up completely and appears to be functioning normally after getting unstuck.I looked at the logs for
fmd
withcat $(svcs -L fmd)
the contained thisThe text was updated successfully, but these errors were encountered: