Processes should be able to self-report as "degraded" #357

mhasself · 2023-10-05T16:52:05Z

Currently we monitor agent instance health through each process' "OpCode", which mostly boils down to whether the process is running or not. (See enum.) This value is included in agent heartbeat; is monitored and streamed to a feed by the registry.

In the model where processes are long-lived, and they deal with hardware problems by simply attempting reconnection, the process running/not-running is not sufficiently informative. Such a process could record the status in session.data somehow ... but for generic propagation of that information we should standardize how that is done and how to get that information out.

Continuing from the discussion on dev call yesterday, a proposal is:

Add "DEGRADED" to the OpCode enum.
Standardize on session.data["degraded_at"] = unix_timestamp to mark running sessions as degraded.
In OpSession, add function "set_degraded(degraded [bool])" to mark / clear the degraded state (which just updates session.data).
In OpSession.op_code property, if status == "running" but data['degraded'] > 0 then return value "DEGRADED".

Individual agents and processes will need to manually implement the use of degraded, if it is applicable to how they deal with errors. Alarms configured for such agents will need to be updated to map the "degraded" state to be as bad as "not running".

The text was updated successfully, but these errors were encountered:

mhasself · 2023-10-05T17:01:15Z

While we're in there, might be nice to have processes automagically transition out of "starting" state when the process code is run, rather than relying on agent code to set_status('running') manually.

mhasself · 2024-04-18T17:41:23Z

Addressed in #371.

mhasself mentioned this issue Jan 8, 2024

Improvements to session.status #371

Merged

6 tasks

mhasself closed this as completed Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processes should be able to self-report as "degraded" #357

Processes should be able to self-report as "degraded" #357

mhasself commented Oct 5, 2023

mhasself commented Oct 5, 2023

mhasself commented Apr 18, 2024

Processes should be able to self-report as "degraded" #357

Processes should be able to self-report as "degraded" #357

Comments

mhasself commented Oct 5, 2023

mhasself commented Oct 5, 2023

mhasself commented Apr 18, 2024