Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.16](backport #5999) Add failureThreshold to elastic-agent self-monitoring config #6105

Merged
merged 1 commit into from
Nov 26, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Nov 20, 2024

What does this PR do?

Use failure_threshold introduced in elastic/beats#41570 in self-monitoring configuration to avoid elastic-agent reporting DEGRADED if it fails to fetch metrics due to a component starting/stopping.
The default value for the failure threshold is set to 2 but it can be configured via config file or fleet policy.

Why is it important?

It is important to avoid a misrepresentation of agent status due to a single metrics fetch erroring out once.
See #5332

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

This is an automatic backport of pull request #5999 done by [Mergify](https://mergify.com).

@mergify mergify bot requested a review from a team as a code owner November 20, 2024 21:37
@mergify mergify bot added the backport label Nov 20, 2024
@mergify mergify bot requested review from michel-laterman and swiatekm and removed request for a team November 20, 2024 21:37
@mergify mergify bot assigned pchila Nov 20, 2024
@pchila pchila requested review from pchila and removed request for michel-laterman and swiatekm November 22, 2024 09:56
@pchila
Copy link
Member

pchila commented Nov 22, 2024

buildkite test this

Copy link
Contributor Author

mergify bot commented Nov 25, 2024

This pull request has not been merged yet. Could you please review and merge it @pchila? 🙏

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Nov 25, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@pchila
Copy link
Member

pchila commented Nov 26, 2024

@Mergifyio rebase

* Add failureThreshold to elastic-agent self-monitoring config

(cherry picked from commit 2a46509)
@pchila pchila force-pushed the mergify/bp/8.16/pr-5999 branch from 6f2fa1d to a567a90 Compare November 26, 2024 09:56
Copy link
Contributor Author

mergify bot commented Nov 26, 2024

rebase

✅ Branch has been successfully rebased

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@pchila pchila merged commit 64e1a43 into 8.16 Nov 26, 2024
13 of 14 checks passed
@pchila pchila deleted the mergify/bp/8.16/pr-5999 branch November 26, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants