Add alerting on Core APM metrics #109055
Labels
enhancement
New value added to drive a business result
impact:low
Addressing this issue will have a low level of impact on the quality/strength of our product.
loe:small
Small Level of Effort
performance
Team:Core
Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Core tracks a few basic APM metrics:
preboot
,setup
andstart
lifecycle duration.We are going to add a few more #78869
Even though we collect these metrics on CI only, it already might be used to notify the Core team whenever metrics are out of reasonable limits. Yesterday, we found out that
setup
lifecycle duration has grown twice in 30 days and caused timeout failures on Cloud. @afharo put up a hotfix #108952 for #108950We should catch such cases in the early stages. I'd suggest setup alerting to prevent cases when
lifecycle
duration > a reasonable limit (say2 sec
forsetup
).The text was updated successfully, but these errors were encountered: