Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify agent health reporting #136

Closed
tigrannajaryan opened this issue Oct 27, 2022 · 3 comments · Fixed by #137
Closed

Clarify agent health reporting #136

tigrannajaryan opened this issue Oct 27, 2022 · 3 comments · Fixed by #137
Labels
required-for-stable Required to be resolved before 1.0

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Oct 27, 2022

The AgentHealth currently has an up field and a last_error fields.

It is not clear how to set fields if the agent process is started and running but it is unhealthy (e.g. we have a way to verify its health by polling a health check endpoint). Should we set up to true or false in this case?

The up field definition is

Set to true if the Agent is up and running.

So, it seems like we should set it to true. However, there is no other explicitly defined way to indicate unhealthiness, unless we assume the presence of last_error is that indicator.

We need to either clarify the spec to say last_error is the indicator or add another field to indicate the unhealthiness (e.g. bool healthy), or maybe rename up to healthy?

@tigrannajaryan
Copy link
Member Author

@andykellr @PeterF778 what do you think?

@andykellr
Copy link
Contributor

andykellr commented Oct 27, 2022

I agree that this is unclear in the spec. I think healthy is a better name. I think an agent that is down is also unhealthy so I do not think we currently need another field to represent running/not-running.

@tigrannajaryan
Copy link
Member Author

I agree that this is unclear in the spec. I think healthy is a better name. I think an agent that is down is also unhealthy so I do not think we currently need another field to represent running/not-running.

What do we do with start_time_unix_nano in that case? It is said to be set when up is true. Should we untie these 2 fields?

@tigrannajaryan tigrannajaryan added the required-for-stable Required to be resolved before 1.0 label Oct 27, 2022
tigrannajaryan added a commit to tigrannajaryan/opamp-spec that referenced this issue Oct 28, 2022
Resolves open-telemetry#136

- Renamed `up` to `healthy`.
- `start_time_unix_nano` is no longer tied to `up` and is set independently.
tigrannajaryan added a commit that referenced this issue Nov 9, 2022
Resolves #136

- Renamed `up` to `healthy`.
- `start_time_unix_nano` is no longer tied to `up` and is set independently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
required-for-stable Required to be resolved before 1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants