Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cases] Case action: Handle closed cases #172709

Merged
merged 70 commits into from
Dec 20, 2023

Conversation

cnasikas
Copy link
Member

@cnasikas cnasikas commented Dec 6, 2023

Summary

Depends on: #171754

Checklist

For maintainers

@cnasikas cnasikas mentioned this pull request Dec 8, 2023
3 tasks
@@ -309,7 +309,7 @@ export interface UpdateRequestWithOriginalCase {
*
* @ignore
*/
export const update = async (
export const bulkUpdate = async (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the method of the case client from update to bulkUpdate to make it clearer what the function does.

@@ -43,7 +43,8 @@ interface GroupedAlerts {
}

type GroupedAlertsWithOracleKey = GroupedAlerts & { oracleKey: string };
type GroupedAlertsWithCaseId = GroupedAlertsWithOracleKey & { caseId: string };
type GroupedAlertsWithOracleRecords = GroupedAlertsWithOracleKey & { oracleRecord: OracleRecord };
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To open new cases for the cases that are closed we need the counter that is stored in the oracle record.

@@ -108,15 +109,15 @@ export class CasesConnector extends SubActionConnector<
/**
* Add circuit breakers to the number of oracles they can be created or retrieved
*/
const oracleRecords = await this.upsertOracleRecords(
const oracleRecordsMap = await this.upsertOracleRecords(groupedAlertsWithOracleKey);
const oracleRecordMapWithTimeWindowHandled = await this.handleTimeWindow(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adcoelho As we discussed I moved the logic of the time window outside the upsert method.

@@ -125,7 +126,13 @@ export class CasesConnector extends SubActionConnector<
groupedAlertsWithCaseId
);

await this.attachAlertsToCases(casesClient, groupedAlertsWithCases, params);
const groupedAlertsWithClosedCasesHandled = await this.handleClosedCases(
Copy link
Member Author

@cnasikas cnasikas Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We either reopen closed cases or create new ones.

return oracleRecordMap;
}

private async handleTimeWindow(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same code as before.

@@ -389,6 +428,7 @@ export class CasesConnector extends SubActionConnector<
* We should find a way to fill the custom fields with default values.
*/
return {
id: caseId,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug.

return casesMapAsArray.find((record) => record.oracleRecord.id === oracleId);
};

const bulkUpdateOracleValidRecords = await this.increaseOracleRecordCounter(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To create new cases, first we need to increase the counter to be able to get the new case ID.

@cnasikas cnasikas marked this pull request as ready for review December 8, 2023 11:26
@cnasikas cnasikas requested a review from a team as a code owner December 8, 2023 11:26
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops-cases (Feature:Cases)

@cnasikas cnasikas changed the title [Cases] Case action: Handle close cases [Cases] Case action: Handle closed cases Dec 12, 2023
);

/**
* TODO: bulkCreate throws an error. Retry on errors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could already be included in this PR?

The logic is more than the regular retry, for some errors(the case exists) we will want to fetch them and do result.set(case.id kinda like the logic in upsert.

Copy link
Member Author

@cnasikas cnasikas Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle all errors and retries in this PR #173012. I think it is better to do it on another PR as the code can grow a lot and be difficult to follow. I believe is better to retry than fetching the cases. In general, we want each action to be idempotent. This means that if we retry it will not affect the correctness of the case action. In your example, if the case exists this means that another Kibana node (that runs the case action) created the case before this Kibana node managed to do it. If we retry, on the next round the node will find that the case exists and will attach the alerts to that case without trying to create a new one. Retries are useful to break race conditions or transient errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle all errors and retries in this PR #173012. I think it is better to do it on another PR as the code can grow a lot and be difficult to follow.

I'm ok with this.

In general, we want each action to be idempotent. This means that if we retry it will not affect the correctness of the case action. In your example, if the case exists this means that another Kibana node (that runs the case action) created the case before this Kibana node managed to do it. If we retry, on the next round the node will find that the case exists and will attach the alerts to that case without trying to create a new one. Retries are useful to break race conditions or transient errors.

Won't it be idempotent anyway? no matter the path followed we always end up with these alerts attached to the case with that ID. Be it if we created the case or if other node did.

Copy link
Member Author

@cnasikas cnasikas Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right on this. It may save us a retry round but I think it will make the code more difficult to follow and the scenario of a conflict should be rare. What do you think of leaving it as it is and when we have the whole picture (retry logic, error handling, etc.) and test a lot we can see if it is worth the optimization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this, my initial suggestion was more about the scope rather than the optimization.

I just elaborated further to make sure I didn't miss some logic in your response 👍

Copy link
Contributor

@js-jankisalvi js-jankisalvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code changes looks good 👍

@cnasikas cnasikas requested a review from a team as a code owner December 20, 2023 10:25
@kibana-ci
Copy link
Collaborator

kibana-ci commented Dec 20, 2023

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #43 / cases security and spaces enabled: basic Common update_alert_status should update the status of multiple alerts attached to multiple cases using the cases client
  • [job] [logs] FTR Configs #43 / cases security and spaces enabled: basic Common update_alert_status should update the status of multiple alerts attached to multiple cases using the cases client
  • [job] [logs] FTR Configs #88 / cases security and spaces enabled: trial Common update_alert_status should update the status of multiple alerts attached to multiple cases using the cases client
  • [job] [logs] FTR Configs #88 / cases security and spaces enabled: trial Common update_alert_status should update the status of multiple alerts attached to multiple cases using the cases client
  • [job] [logs] Jest Tests #21 / update Total comments and alerts calls the attachment service with the right params and returns the expected comments and alerts

Metrics [docs]

‼️ ERROR: no builds found for mergeBase sha [c095b48]

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @cnasikas

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prettier messed up with the file. This PR will merged into a feature branch.

@cnasikas cnasikas merged commit f6491cb into elastic:case_action Dec 20, 2023
32 of 39 checks passed
@cnasikas cnasikas deleted the ca_reopen_cases branch December 20, 2023 12:37
cnasikas added a commit that referenced this pull request Jan 12, 2024
## Summary

This PR:

1. Creates the `CasesConnectorError` error
2. Separate the execution logic by moving the current logic to a new
class called `CasesConnectorExecutor`
3. Let the `CasesConnector` class handle only the retry logic of the
connector
4. Implements the [Full jitter backoff
algorithm](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)
which is used as the retry strategy of the connector

Depends on: #172709

### Checklist

Delete any items that are not applicable to this PR.

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

---------

Co-authored-by: kibanamachine <[email protected]>
cnasikas added a commit that referenced this pull request Apr 12, 2024
## Summary

Depends on: #166267,
#170326,
#169484,
#173740,
#173763,
#178068,
#178307,
#178600,
#180437

PRs:
- #168370
- #169229
- #171754
- #172709
- #173012
- #175107
- #175452
- #175505
- #177033
- #178277
- #177139
- #179796

Fixes: #153837

## Testing

Run Kibana with `--run-examples` if you want to use the "Always firing"
rule.

Create a rule with a case action in observability and the stack. The
security solution is not supported. You should not be able to assign a
case action in a security solution rule.

1. Test the "Reopen closed cases" configuration.
2. Test the "Grouping by" configuration. Only one field is allowed. Not
all fields are persisted in alerts. If you select a field not part of
the alert the case action will create a case where the grouping value is
set to `unknow`.
3. Test the "Time window" feature. You can comment out the validation to
test for shorter times.
4. Verify that the case action is experimental.
5. Verify that based on the rule type the case is created in the correct
solution.
6. Verify that you cannot create a rule with the case action on the
basic license.
7. Verify that the execution of the case action fails if you do not have
permission for cases. Pending work on the system actions framework level
to not allow users to create rules with system actions where they do not
have permission.
8. Stress test the case action by creating multiple rules.

### Checklist

Delete any items that are not applicable to this PR.

- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### For maintainers

- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)

## Release notes

Automatically create cases when an alert is triggered.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: adcoelho <[email protected]>
Co-authored-by: Janki Salvi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Cases Cases feature release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants