Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Alerting] Update FAAD AlertsClient to support AAD payload #158404

Merged
merged 12 commits into from
Jun 7, 2023

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented May 24, 2023

Resolves #156443, #156445

Summary

  • Updates AlertsClient with create API that allows rule executors to report alerts with AAD payload, along with the values for actionGroup, context and state. This proxies the LegacyAlertsClient to create alerts via the alerts factory but also saves the reported payload
  • When the alert doc is bulk written at the end of rule execution, the AAD payload (if specified) is included in the alert document
  • Deprecates the alert factory that is passed into the rule executors, but this PR does not remove or replace usages of the alert factory
  • Expose AlertsClient services to the rule executors. Note that this PR does not migrate any rule type to use this service.

This PR does not opt any rule types into writing the AAD payload or using the AlertsClient API but updates the AAD functional test to do so. To test it out with the ES query rule type, use the following commit: 1b1e139

Followup issues

  • This PR does not add a recovery API to the FAAD AlertsClient so alerts reported via the new alerts client currently do not have a way of specifying recovered payload.

To Verify

  • Verify that rule registry rule types still work as expected
  • Verify that non rule-registry rule types still work as expected
  • Check out this commit which onboards the ES query rule type onto FAAD. Create an ES query rule that alerts and then recovers and verify that the alert documents look as expected. Alternatively, you can modify your own rule type to register with FAAD and write alerts and verify that the alert documents look as expected.

Checklist

@ymao1 ymao1 force-pushed the alerting/faad-api-rule-type-payload branch 2 times, most recently from b133daf to bcf57ab Compare May 24, 2023 19:53
@ymao1 ymao1 changed the title Adding ability to create alert in new alerts client [Response Ops][Alerting] Update FAAD AlertsClient to support AAD payload Jun 1, 2023
@ymao1 ymao1 force-pushed the alerting/faad-api-rule-type-payload branch 3 times, most recently from e31e3b7 to 7045165 Compare June 2, 2023 15:45
import { alertFieldMap } from '@kbn/alerts-as-data-utils';
import { RuleAlertData } from '../../types';

const allowedFrameworkFields: string[] = [ALERT_REASON];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kibana.alert.reason is defined in the alertFieldMap but the framework does not set it, it is returned by rule types. We may want to set it at a framework level in the future but for now, we add it to this list to allow rule types to override this value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for fields defined by the framework but expected to be set by rule - and we don't have any other fields like that yet?

It should probably be a set instead of array, optimize the filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmuellr Yes, that's correct. There are a few more fields that I'm unsure about: kibana.alert.last_detected and kibana.alert.url come to mind but we can add those as needed I think

Updated to set in this commit: ce316fd

@@ -124,11 +124,12 @@ function processAlertsHelper<
// this alert did exist in previous run
// calculate duration to date for active alerts
const state = existingAlerts[id].getState();
const currentState = activeAlerts[id].getState();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug where an ongoing alert that uses replaceState will have its updated state value overriden.

@ymao1 ymao1 force-pushed the alerting/faad-api-rule-type-payload branch 4 times, most recently from 0ef3711 to d11c667 Compare June 2, 2023 19:27
@ymao1 ymao1 self-assigned this Jun 2, 2023
@ymao1 ymao1 added Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry v8.9.0 labels Jun 2, 2023
@ymao1 ymao1 force-pushed the alerting/faad-api-rule-type-payload branch from d11c667 to 3b6d2e6 Compare June 5, 2023 12:05
@ymao1 ymao1 marked this pull request as ready for review June 5, 2023 13:08
@ymao1 ymao1 requested review from a team as code owners June 5, 2023 13:08
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@mikecote mikecote self-requested a review June 5, 2023 19:03
Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! Reviewed the code and tested locally with the sample ES Query snippet you shared. I left a bunch of questions but nothing blocking from my perspective 👍

Comment on lines +332 to +336
} catch (err) {
this.options.logger.error(
`Error writing ${alertsToIndex.length} alerts to ${this.indexTemplateAndPattern.alias} - ${err.message}`
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Do we need to process the bulk response on success as well (when no errors are thrown)? If an error happens for a specific document, it would return as a success and have the error information in the payload (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-api-response-body).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should retry the individual documents that fail or just log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logging for now is ok and if we think it's worth retrying or partially failing the rule execution, we can create a follow up issue to discuss / implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 633dbb1

type AlertRecord = Record<string, AlertTypes>;
type AlertFields = AlertTypes | AlertRecord;

const removeEmptyObjects = (data: AlertFields): AlertFields => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikecote Do you see any gotchas in this function? Using lodash omit leaves empty objects so we'd be left with fields like

kibana: {
  alert: {
    duration: {}
  }
}

if we had to strip kibana.alert.duration.us from the payload (for example), so I added this function but maybe it is overkill?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function looks good to me. To see if the function is necessary, I just did a check on the behaviours when calling the Elasticsearch index and update APIs to update an existing alert document. In both scenarios with empty objects, the call mutated the alert in the exact way as if it didn't have an empty object, so it seems not doing removeEmptyObjects will work the same.

When creating alerts, the created alert would just contain an empty object but I don't see any issues doing so either. So it seems safe to remove the removeEmptyObjects function and save an processing step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in b385e5f

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, noted one optimization

import { alertFieldMap } from '@kbn/alerts-as-data-utils';
import { RuleAlertData } from '../../types';

const allowedFrameworkFields: string[] = [ALERT_REASON];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just for fields defined by the framework but expected to be set by rule - and we don't have any other fields like that yet?

It should probably be a set instead of array, optimize the filter.

Copy link
Contributor

@e40pud e40pud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detection engine changes LGTM

Copy link
Member

@maryam-saeidi maryam-saeidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AO changes LGTM!

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
alerting 595 594 -1

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
alerting 44 45 +1
Unknown metric groups

API count

id before after diff
alerting 617 618 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 413 417 +4
total +6

References to deprecated APIs

id before after diff
infra 59 65 +6
ml 153 157 +4
monitoring 4 5 +1
observability 3 5 +2
ruleRegistry 0 4 +4
securitySolution 603 605 +2
stackAlerts 58 64 +6
synthetics 42 49 +7
transform 29 30 +1
total +33

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 497 501 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ymao1

@ymao1 ymao1 merged commit 3aa3f04 into elastic:main Jun 7, 2023
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jun 7, 2023
@ymao1 ymao1 deleted the alerting/faad-api-rule-type-payload branch June 7, 2023 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting/Alerts-as-Data Issues related to Alerts-as-data and RuleRegistry Feature:Alerting release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Response Ops][Alerting] Enhance AlertsClient to support AAD payload
8 participants