You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is another instance of steno oxidecomputer/steno#26, similar to a few other issues linked to the ticket. The scenario has to do with creating a snapshot for an unattached disk that was made from the propolis alpine image:
Apr 13 00:29:34.003 INFO request completed, error_message_external: Internal Server Error, error_message_internal: Error: e No such file or directory (os error 2) No extent file found for "/opt/oxide/propolis-server/blob/alpine.iso", response_code: 500, uri: /crucible/pantry/0/volume/72427479-2b8f-4d7b-a4a7-2ddaa69c4ee1, method: POST, req_id: 187b8073-1eab-4c48-9334-63c341e78a25, remote_addr: [fd00:1122:3344:101::3]:33049, local_addr: [fd00:1122:3344:101::a]:17000, component: dropshot
Real images won't hit this error but it is conceivable that snapshotting can fail for other reasons. In a situation like this, Nexus panics and can't be recovered as it keeps retrying to unwind the saga even if I manually svcadm clear it:
{"msg":"authorize result","v":0,"name":"nexus","level":20,"time":"2023-04-12T17:34:19.524915113-07:00","hostname":"oxz_nexus","pid":14511,"component":"DataLoader","component":"nexus","component":"ServerContext","name":"91ac0ab6-194e-4fc6-aafe-e46546eeffd6","result":"Ok(())","resource":"Database","action":"Query","actor":"Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000001, .. })"}
{"msg":"saga resume","v":0,"name":"nexus","level":30,"time":"2023-04-12T17:34:19.537782834-07:00","hostname":"oxz_nexus","pid":14511,"saga_name":"snapshot-create","saga_id":"a336a6a4-d1e1-4dfe-9b7b-69287b3325fd","sec_id":"91ac0ab6-194e-4fc6-aafe-e46546eeffd6","component":"SEC","component":"nexus","component":"ServerContext","name":"91ac0ab6-194e-4fc6-aafe-e46546eeffd6","dag":"{\"end_node\":18,\"graph\":{\"edge_property\":\"directed\",\"edges\":[[0,1,null],[1,2,null],[2,3,null],[3,4,null],[4,5,null],[5,6,null],[6,7,null],[7,8,null],[8,9,null],[9,10,null],[10,11,null],[11,12,null],[12,13,null],[13,14,null],[14,15,null],[15,16,null],[17,0,null],[16,18,null]],\"node_holes\":[],\"nodes\":[{\"Action\":{\"action_name\":\"common.uuid_generate\",\"label\":\"GenerateSnapshotId\",\"name\":\"snapshot_id\"}},{\"Action\":{\"action_name\":\"common.uuid_generate\",\"label\":\"GenerateVolumeId\",\"name\":\"volume_id\"}},{\"Action\":{\"action_name\":\"common.uuid_generate\",\"label\":\"GenerateDestinationVolumeId\",\"name\":\"destination_volume_id\"}},{\"Action\":{\"action_name\":\"snapshot_create.regions_alloc\",\"label\":\"RegionsAlloc\",\"name\":\"datasets_and_regions\"}},{\"Action\":{\"action_name\":\"snapshot_create.regions_ensure\",\"label\":\"RegionsEnsure\",\"name\":\"regions_ensure\"}},{\"Action\":{\"action_name\":\"snapshot_create.create_destination_volume_record\",\"label\":\"CreateDestinationVolumeRecord\",\"name\":\"created_destination_volume\"}},{\"Action\":{\"action_name\":\"snapshot_create.create_snapshot_record\",\"label\":\"CreateSnapshotRecord\",\"name\":\"created_snapshot\"}},{\"Action\":{\"action_name\":\"snapshot_create.space_account\",\"label\":\"SpaceAccount\",\"name\":\"no_result\"}},{\"Action\":{\"action_name\":\"snapshot_create.get_pantry_address\",\"label\":\"GetPantryAddress\",\"name\":\"pantry_address\"}},{\"Action\":{\"action_name\":\"snapshot_create.attach_disk_to_pantry\",\"label\":\"AttachDiskToPantry\",\"name\":\"disk_generation_number\"}},{\"Action\":{\"action_name\":\"snapshot_create.call_pantry_attach_for_disk\",\"label\":\"CallPantryAttachForDisk\",\"name\":\"call_pantry_attach_for_disk\"}},{\"Action\":{\"action_name\":\"snapshot_create.call_pantry_snapshot_for_disk\",\"label\":\"CallPantrySnapshotForDisk\",\"name\":\"call_pantry_snapshot_for_disk\"}},{\"Action\":{\"action_name\":\"snapshot_create.call_pantry_detach_for_disk\",\"label\":\"CallPantryDetachForDisk\",\"name\":\"call_pantry_detach_for_disk\"}},{\"Action\":{\"action_name\":\"snapshot_create.start_running_snapshot\",\"label\":\"StartRunningSnapshot\",\"name\":\"replace_sockets_map\"}},{\"Action\":{\"action_name\":\"snapshot_create.create_volume_record\",\"label\":\"CreateVolumeRecord\",\"name\":\"created_volume\"}},{\"Action\":{\"action_name\":\"snapshot_create.finalize_snapshot_record\",\"label\":\"FinalizeSnapshotRecord\",\"name\":\"finalized_snapshot\"}},{\"Action\":{\"action_name\":\"snapshot_create.detach_disk_from_pantry\",\"label\":\"DetachDiskFromPantry\",\"name\":\"detach_disk_from_pantry\"}},{\"Start\":{\"params\":{\"create_params\":{\"description\":\"snap\",\"disk\":\"server-image-disk\",\"name\":\"server-image-snap\"},\"disk_id\":\"72427479-2b8f-4d7b-a4a7-2ddaa69c4ee1\",\"project_id\":\"43acf783-a348-4d4c-ac3c-04ded1bcbd7a\",\"serialized_authn\":{\"kind\":{\"Authenticated\":{\"actor\":{\"SiloUser\":{\"silo_id\":\"001de000-5110-4000-8000-000000000000\",\"silo_user_id\":\"001de000-05e4-4000-8000-000000004007\"}}}}},\"silo_id\":\"001de000-5110-4000-8000-000000000000\",\"use_the_pantry\":true}}},\"End\"]},\"saga_name\":\"snapshot-create\",\"start_node\":17}"}
{"msg":"ssc_regions_ensure_undo: Deleting crucible regions","v":0,"name":"nexus","level":40,"time":"2023-04-12T17:34:19.538472666-07:00","hostname":"oxz_nexus","pid":14511,"saga_type":"recovery","component":"nexus","component":"ServerContext","name":"91ac0ab6-194e-4fc6-aafe-e46546eeffd6"}
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Internal Error: Internal Server Error', /home/angela/.cargo/registry/src/github.com-1ecc6299db9ec823/steno-0.3.1/src/saga_exec.rs:1187:65
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[ Apr 12 17:34:20 Stopping because all processes in service exited. ]
[ Apr 12 17:34:20 Executing stop method (:kill). ]
[ Apr 12 17:34:20 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/nexus/bin/nexus /var/svc/manifest/site/nexus/config.toml &"). ]
The text was updated successfully, but these errors were encountered:
Note that the answer for oxidecomputer/steno#26 is probably going to be something like: undo actions cannot fail -- they need to continue trying until they succeed, or maybe put the saga into a "needs support" state. That is, steno can stop panicking, but I think we shouldn't be blocked on that because we need to do something else in the saga when we're otherwise failing.
This is another instance of steno oxidecomputer/steno#26, similar to a few other issues linked to the ticket. The scenario has to do with creating a snapshot for an unattached disk that was made from the propolis alpine image:
Real images won't hit this error but it is conceivable that snapshotting can fail for other reasons. In a situation like this, Nexus panics and can't be recovered as it keeps retrying to unwind the saga even if I manually
svcadm clear
it:The text was updated successfully, but these errors were encountered: