Clarify agent health reporting #136

tigrannajaryan · 2022-10-27T19:25:26Z

The AgentHealth currently has an up field and a last_error fields.

It is not clear how to set fields if the agent process is started and running but it is unhealthy (e.g. we have a way to verify its health by polling a health check endpoint). Should we set up to true or false in this case?

The up field definition is

Set to true if the Agent is up and running.

So, it seems like we should set it to true. However, there is no other explicitly defined way to indicate unhealthiness, unless we assume the presence of last_error is that indicator.

We need to either clarify the spec to say last_error is the indicator or add another field to indicate the unhealthiness (e.g. bool healthy), or maybe rename up to healthy?

The text was updated successfully, but these errors were encountered:

tigrannajaryan · 2022-10-27T19:25:50Z

@andykellr @PeterF778 what do you think?

andykellr · 2022-10-27T20:28:01Z

I agree that this is unclear in the spec. I think healthy is a better name. I think an agent that is down is also unhealthy so I do not think we currently need another field to represent running/not-running.

tigrannajaryan · 2022-10-27T23:41:48Z

I agree that this is unclear in the spec. I think healthy is a better name. I think an agent that is down is also unhealthy so I do not think we currently need another field to represent running/not-running.

What do we do with start_time_unix_nano in that case? It is said to be set when up is true. Should we untie these 2 fields?

Resolves open-telemetry#136 - Renamed `up` to `healthy`. - `start_time_unix_nano` is no longer tied to `up` and is set independently.

Resolves #136 - Renamed `up` to `healthy`. - `start_time_unix_nano` is no longer tied to `up` and is set independently.

tigrannajaryan added the required-for-stable Required to be resolved before 1.0 label Oct 27, 2022

tigrannajaryan added a commit to tigrannajaryan/opamp-spec that referenced this issue Oct 28, 2022

Clarify agent health reporting

68f0ab2

Resolves open-telemetry#136 - Renamed `up` to `healthy`. - `start_time_unix_nano` is no longer tied to `up` and is set independently.

tigrannajaryan mentioned this issue Oct 28, 2022

Clarify agent health reporting #137

Merged

tigrannajaryan closed this as completed in #137 Nov 9, 2022

tigrannajaryan added a commit that referenced this issue Nov 9, 2022

Clarify agent health reporting (#137)

ac8269a

Resolves #136 - Renamed `up` to `healthy`. - `start_time_unix_nano` is no longer tied to `up` and is set independently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify agent health reporting #136

Clarify agent health reporting #136

tigrannajaryan commented Oct 27, 2022 •

edited

Loading

tigrannajaryan commented Oct 27, 2022

andykellr commented Oct 27, 2022 •

edited

Loading

tigrannajaryan commented Oct 27, 2022

Clarify agent health reporting #136

Clarify agent health reporting #136

Comments

tigrannajaryan commented Oct 27, 2022 • edited Loading

tigrannajaryan commented Oct 27, 2022

andykellr commented Oct 27, 2022 • edited Loading

tigrannajaryan commented Oct 27, 2022

tigrannajaryan commented Oct 27, 2022 •

edited

Loading

andykellr commented Oct 27, 2022 •

edited

Loading