Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Actionable Observability] - Add latency alert history chart on the Alert details page for APM #148011

Merged
merged 51 commits into from
Jan 19, 2023

Conversation

fkanout
Copy link
Contributor

@fkanout fkanout commented Dec 22, 2022

Summary

Closes #147932 by adding a new latency chart that covers the last 30 days of alerts for a given rule.
And it adds annotations with the number of alerts for a given day besides the time to recover.

Screenshot 2023-01-18 at 16 22 08

@fkanout fkanout added release_note:feature Makes this part of the condensed release notes Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" v8.7.0 labels Dec 22, 2022
@fkanout fkanout self-assigned this Dec 22, 2022
@kdelemme
Copy link
Contributor

kdelemme commented Jan 17, 2023

@fkanout
With @XavierM we found a way to use the nested "avg" aggregations suggested by @simianhacker

{
  "index": ".alerts-observability.apm.alerts-*",
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "kibana.alert.rule.uuid": "aa081910-969e-11ed-adcb-b1469b179bd9"
          }
        },
        {
          "range": {
            "kibana.alert.time_range": { "gte": "now-30d", "lt": "now" }
          }
        }
      ]
    }
  },
  "aggs": {
    "histogramTriggeredAlerts": {
      "date_histogram": {
        "field": "kibana.alert.start",
        "fixed_interval": "1d",
        "extended_bounds": { "min": "now-30d", "max": "now" }
      }
    },
    "avgTimeToRecoverMSE": {
      "filter": {
        "term": {
          "kibana.alert.status": "recovered"
        }
      },
      "aggs": {
        "recoveryTime": {
          "avg": {
            "field": "kibana.alert.duration.us"
          }
        }
      }
    }
  }
}

And for this, you need to change the x-pack/plugins/rule_registry/common/types.ts using this:

export const bucketAggsSchemas = t.intersection([
  bucketAggsTempsSchemas,
  t.exact(
    t.partial({
      aggs: t.record(t.string, t.intersection([metricsAggsSchemas, bucketAggsTempsSchemas])),
      aggregations: t.record(
        t.string,
        t.intersection([metricsAggsSchemas, bucketAggsTempsSchemas])
      ),
    })
  ),
]);

Basically, we cannot use and allow script from the API because it bypass any security measure in place. And would allow, technically, anyone to access any alerts stored (from @XavierM). Also, the signature comments for MetricAggs in the types.ts file have been misplaced.

@fkanout

This comment was marked as resolved.

Copy link
Contributor

@XavierM XavierM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -> Type and query looks good!!!
@kdelemme thank you for pairing with me yesterday to fix the type

Copy link
Member

@simianhacker simianhacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me... tested in browser.

@fkanout fkanout enabled auto-merge (squash) January 19, 2023 13:08
errorTriggeredAlertHistory?: string;
triggeredAlertsData?: FetchTriggeredAlertsHistory;
}
export function useFetchTriggeredAlertsHistory({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like most of this belongs as a client in the alerts plugin. What's specific to APM here?

Copy link
Member

@sorenlouv sorenlouv Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a client, we can import it in apm and use it like:

const alertsHistory = await alertsClient.getTriggerdAlertsHistory({ features: 'apm', ruleId })

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, can we create an issue and have follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -287,11 +279,22 @@ export const metricsAggsSchemas = t.exact(
}),
})
),
aggs: t.undefined,
Copy link
Contributor

@XavierM XavierM Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what Xavier did here? We added these two attributes to not allow user to create nested aggs on metrics aggregations knowing that ES won't like it. It was to make our API nicer/smarter. However, this created a nightmare to combine metrics and bucket aggs together. So We decide to remove these two attributes and let ES tell you when the aggs is not formatted correctly. So now our validation of type is simpler and cleaner to use.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
apm 1356 1369 +13

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
observability 593 598 +5

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
apm 3.2MB 3.3MB +89.2KB
observability 567.2KB 567.3KB +27.0B
total +89.3KB

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
observability 31 32 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
apm 29.1KB 32.1KB +3.1KB
observability 82.9KB 83.6KB +730.0B
total +3.8KB
Unknown metric groups

API count

id before after diff
observability 598 604 +6

ESLint disabled line counts

id before after diff
apm 70 71 +1

Total ESLint disabled count

id before after diff
apm 83 84 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @fkanout

@fkanout fkanout merged commit ff3d413 into elastic:main Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:feature Makes this part of the condensed release notes Team: Actionable Observability - DEPRECATED For Observability Alerting and SLOs use "Team:obs-ux-management", for AIops "Team:obs-knowledge" Team:APM All issues that need APM UI Team support v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Actionable Observability] - Add the latency alerts history to the APM Alert details page
10 participants