Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alerting] More telemetry for 8.0 based on Event Log data #115318

Merged

Conversation

YulNaumenko
Copy link
Contributor

@YulNaumenko YulNaumenko commented Oct 18, 2021

Summary

Resolves #60315

Added more telemetry based on the Event Log data. Answered the questions from the issue above:

  1. What is the total count of rule executions? How many times did rules encounter execution failures for read? decrypt? unknown? license? What is the count by rule type of execution failures with an error reason of execute?
  2. What is the total count of action executions? What is the count of action execution failures by connector type?
  3. What is the average rule execution time by rule type?
  4. What is the average connector execution time by connector type?

Actions telemetry:

"actions": {
  "count_total": 3,
  "count_by_type": {
    "server_log": 1,
    "email": 2
  },
  "count_active_total": 3,
  "count_active_by_type": {
    "server_log": 1,
    "email": 2
  },
  "count_active_alert_history_connectors": 0,
  "count_actions_executions": 324,
  "count_actions_executions_by_type": {
    "server_log": 160,
    "email": 164
  },
  "count_actions_executions_failured": 164,
  "count_actions_executions_failured_by_type": {
    "email": 164
  },
  "avg_execution_time": 191,
  "avg_execution_time_by_type": {
    "server_log": 1,
    "email": 376
  },
  "alert_history_connector_enabled": false
},

Rules telemetry:

"alerts": {
  "count_total": 2,
  "count_by_type": {
    "index_threshold": 2
  },
  "throttle_time": {
    "min": "0s",
    "avg": "0s",
    "max": "0s"
  },
  "schedule_time": {
    "min": "10s",
    "avg": "35s",
    "max": "60s"
  },
  "connectors_per_alert": {
    "min": 0,
    "avg": 1.5,
    "max": 0
  },
  "count_active_by_type": {
    "index_threshold": 2
  },
  "count_active_total": 2,
  "count_disabled_total": 0,
  "count_rules_executions": 774,
  "count_rules_executions_by_type": {
    "index_threshold": 774
  },
  "count_rules_executions_failured": 0,
  "count_rules_executions_failured_by_reason": {},
  "count_rules_executions_failured_by_reason_by_type": {},
  "avg_execution_time": 1409,
  "avg_execution_time_by_type": {
    "index_threshold": 1409
}

@YulNaumenko YulNaumenko added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.16.0 Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework labels Oct 18, 2021
@YulNaumenko YulNaumenko self-assigned this Oct 18, 2021
@YulNaumenko YulNaumenko requested review from a team as code owners October 18, 2021 03:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

Comment on lines 29 to 33
"count_actions_executions_failured_by_type": {
"properties": {
"DYNAMIC_KEY": {
"type": "long"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Bamieh @TinaHeiligers that's a lot of additional fields, could you just check this seems fine to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a better alternatives for this sadly. we can handle mapping them via schema patterns on the telemetry cluster.

@YulNaumenko
Copy link
Contributor Author

@elasticmachine merge upstream

@YulNaumenko
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
eventLog 80 81 +1
Unknown metric groups

API count

id before after diff
eventLog 80 81 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @YulNaumenko

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great so far! I did an initial pass and provided some feedback/questions

@YulNaumenko
Copy link
Contributor Author

@elasticmachine merge upstream

@YulNaumenko
Copy link
Contributor Author

@elasticmachine merge upstream

@YulNaumenko
Copy link
Contributor Author

@elasticmachine merge upstream

@YulNaumenko
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the addition of the date filtering on the event log queries

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
eventLog 80 81 +1
Unknown metric groups

API count

id before after diff
eventLog 80 81 +1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @YulNaumenko

@YulNaumenko YulNaumenko merged commit 6e14338 into elastic:main Nov 2, 2021
YulNaumenko added a commit to YulNaumenko/kibana that referenced this pull request Nov 2, 2021
…5318)

* [Alerting] More telemetry for 8.0 based on Event Log data

* fixed event log index mapping

* fixed typecheck

* fixed tests

* added avg aggs

* set size to 0

* fixed due to comments

* fixed telemetry schema

* fixed query

* removed test data

* added tests

* fixed test

* fixed query

* added exection detalization by day

* fixed test

* fixed for rules

* fixed schema

* fixed schema

Co-authored-by: Kibana Machine <[email protected]>
YulNaumenko added a commit that referenced this pull request Nov 2, 2021
…117191)

* [Alerting] More telemetry for 8.0 based on Event Log data

* fixed event log index mapping

* fixed typecheck

* fixed tests

* added avg aggs

* set size to 0

* fixed due to comments

* fixed telemetry schema

* fixed query

* removed test data

* added tests

* fixed test

* fixed query

* added exection detalization by day

* fixed test

* fixed for rules

* fixed schema

* fixed schema

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting/RuleActions Issues related to the Actions attached to Rules on the Alerting Framework Feature:Alerting/RulesFramework Issues related to the Alerting Rules Framework release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More alerting services telemetry
7 participants