-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cases] Case action: Handle closed cases #172709
Conversation
…o register_case_action
…o register_case_action
@@ -309,7 +309,7 @@ export interface UpdateRequestWithOriginalCase { | |||
* | |||
* @ignore | |||
*/ | |||
export const update = async ( | |||
export const bulkUpdate = async ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the method of the case client from update
to bulkUpdate
to make it clearer what the function does.
@@ -43,7 +43,8 @@ interface GroupedAlerts { | |||
} | |||
|
|||
type GroupedAlertsWithOracleKey = GroupedAlerts & { oracleKey: string }; | |||
type GroupedAlertsWithCaseId = GroupedAlertsWithOracleKey & { caseId: string }; | |||
type GroupedAlertsWithOracleRecords = GroupedAlertsWithOracleKey & { oracleRecord: OracleRecord }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To open new cases for the cases that are closed we need the counter that is stored in the oracle record.
@@ -108,15 +109,15 @@ export class CasesConnector extends SubActionConnector< | |||
/** | |||
* Add circuit breakers to the number of oracles they can be created or retrieved | |||
*/ | |||
const oracleRecords = await this.upsertOracleRecords( | |||
const oracleRecordsMap = await this.upsertOracleRecords(groupedAlertsWithOracleKey); | |||
const oracleRecordMapWithTimeWindowHandled = await this.handleTimeWindow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adcoelho As we discussed I moved the logic of the time window outside the upsert method.
@@ -125,7 +126,13 @@ export class CasesConnector extends SubActionConnector< | |||
groupedAlertsWithCaseId | |||
); | |||
|
|||
await this.attachAlertsToCases(casesClient, groupedAlertsWithCases, params); | |||
const groupedAlertsWithClosedCasesHandled = await this.handleClosedCases( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We either reopen closed cases or create new ones.
return oracleRecordMap; | ||
} | ||
|
||
private async handleTimeWindow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same code as before.
@@ -389,6 +428,7 @@ export class CasesConnector extends SubActionConnector< | |||
* We should find a way to fill the custom fields with default values. | |||
*/ | |||
return { | |||
id: caseId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug.
return casesMapAsArray.find((record) => record.oracleRecord.id === oracleId); | ||
}; | ||
|
||
const bulkUpdateOracleValidRecords = await this.increaseOracleRecordCounter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create new cases, first we need to increase the counter to be able to get the new case ID.
Pinging @elastic/response-ops (Team:ResponseOps) |
Pinging @elastic/response-ops-cases (Feature:Cases) |
x-pack/plugins/cases/server/connectors/cases/cases_connector.ts
Outdated
Show resolved
Hide resolved
x-pack/plugins/cases/server/connectors/cases/cases_connector.ts
Outdated
Show resolved
Hide resolved
); | ||
|
||
/** | ||
* TODO: bulkCreate throws an error. Retry on errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could already be included in this PR?
The logic is more than the regular retry, for some errors(the case exists) we will want to fetch them and do result.set(case.id
kinda like the logic in upsert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will handle all errors and retries in this PR #173012. I think it is better to do it on another PR as the code can grow a lot and be difficult to follow. I believe is better to retry than fetching the cases. In general, we want each action to be idempotent. This means that if we retry it will not affect the correctness of the case action. In your example, if the case exists this means that another Kibana node (that runs the case action) created the case before this Kibana node managed to do it. If we retry, on the next round the node will find that the case exists and will attach the alerts to that case without trying to create a new one. Retries are useful to break race conditions or transient errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will handle all errors and retries in this PR #173012. I think it is better to do it on another PR as the code can grow a lot and be difficult to follow.
I'm ok with this.
In general, we want each action to be idempotent. This means that if we retry it will not affect the correctness of the case action. In your example, if the case exists this means that another Kibana node (that runs the case action) created the case before this Kibana node managed to do it. If we retry, on the next round the node will find that the case exists and will attach the alerts to that case without trying to create a new one. Retries are useful to break race conditions or transient errors.
Won't it be idempotent anyway? no matter the path followed we always end up with these alerts attached to the case with that ID. Be it if we created the case or if other node did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right on this. It may save us a retry round but I think it will make the code more difficult to follow and the scenario of a conflict should be rare. What do you think of leaving it as it is and when we have the whole picture (retry logic, error handling, etc.) and test a lot we can see if it is worth the optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this, my initial suggestion was more about the scope rather than the optimization.
I just elaborated further to make sure I didn't miss some logic in your response 👍
x-pack/plugins/cases/server/connectors/cases/cases_connector.test.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changes looks good 👍
90a3f90
to
db6ede3
Compare
💔 Build FailedFailed CI Steps
Test Failures
Metrics [docs]
History
To update your PR or re-run it, just comment with: cc @cnasikas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prettier messed up with the file. This PR will merged into a feature branch.
## Summary This PR: 1. Creates the `CasesConnectorError` error 2. Separate the execution logic by moving the current logic to a new class called `CasesConnectorExecutor` 3. Let the `CasesConnector` class handle only the retry logic of the connector 4. Implements the [Full jitter backoff algorithm](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) which is used as the retry strategy of the connector Depends on: #172709 ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: kibanamachine <[email protected]>
## Summary Depends on: #166267, #170326, #169484, #173740, #173763, #178068, #178307, #178600, #180437 PRs: - #168370 - #169229 - #171754 - #172709 - #173012 - #175107 - #175452 - #175505 - #177033 - #178277 - #177139 - #179796 Fixes: #153837 ## Testing Run Kibana with `--run-examples` if you want to use the "Always firing" rule. Create a rule with a case action in observability and the stack. The security solution is not supported. You should not be able to assign a case action in a security solution rule. 1. Test the "Reopen closed cases" configuration. 2. Test the "Grouping by" configuration. Only one field is allowed. Not all fields are persisted in alerts. If you select a field not part of the alert the case action will create a case where the grouping value is set to `unknow`. 3. Test the "Time window" feature. You can comment out the validation to test for shorter times. 4. Verify that the case action is experimental. 5. Verify that based on the rule type the case is created in the correct solution. 6. Verify that you cannot create a rule with the case action on the basic license. 7. Verify that the execution of the case action fails if you do not have permission for cases. Pending work on the system actions framework level to not allow users to create rules with system actions where they do not have permission. 8. Stress test the case action by creating multiple rules. ### Checklist Delete any items that are not applicable to this PR. - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios ### For maintainers - [x] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) ## Release notes Automatically create cases when an alert is triggered. --------- Co-authored-by: kibanamachine <[email protected]> Co-authored-by: adcoelho <[email protected]> Co-authored-by: Janki Salvi <[email protected]>
Summary
Depends on: #171754
Checklist
For maintainers