Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alerting on Core APM metrics #109055

Open
mshustov opened this issue Aug 18, 2021 · 4 comments
Open

Add alerting on Core APM metrics #109055

mshustov opened this issue Aug 18, 2021 · 4 comments
Labels
enhancement New value added to drive a business result impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort performance Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@mshustov
Copy link
Contributor

mshustov commented Aug 18, 2021

Core tracks a few basic APM metrics: preboot, setup and start lifecycle duration.
We are going to add a few more #78869
Even though we collect these metrics on CI only, it already might be used to notify the Core team whenever metrics are out of reasonable limits. Yesterday, we found out that setup lifecycle duration has grown twice in 30 days and caused timeout failures on Cloud. @afharo put up a hotfix #108952 for #108950
We should catch such cases in the early stages. I'd suggest setup alerting to prevent cases when lifecycle duration > a reasonable limit (say 2 sec for setup).

@mshustov mshustov added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc performance enhancement New value added to drive a business result labels Aug 18, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@afharo
Copy link
Member

afharo commented Aug 18, 2021

should we leverage ML Anomaly alerts for this instead of fixed thresholds?

@mshustov
Copy link
Contributor Author

should we leverage ML Anomaly alerts for this instead of fixed thresholds?

Why? What benefits does it add? Can it detect changes when performance metrics are degrading slowly over time? We can consider it as a future enhancement.

@mshustov
Copy link
Contributor Author

@spalger can we include these Core metrics in the performance testing scenarios you are working on #98337 ?

@mshustov mshustov added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:medium Medium Level of Effort loe:needs-research This issue requires some research before it can be worked on or estimated and removed loe:medium Medium Level of Effort labels Nov 2, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:small Small Level of Effort and removed loe:needs-research This issue requires some research before it can be worked on or estimated labels Nov 4, 2021
@lizozom lizozom changed the title Add alerting for Core APM metrics Add alerting on Core APM metrics Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:small Small Level of Effort performance Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

3 participants