[Alerting] Explanation and approach to migrating and removing "siem-detection-engine-rule-actions" saved object #112327

FrankHassanabad · 2021-09-15T17:51:49Z

We removed the legacy notification system during part 1 here:
#109722

Which included the code for the alerting side car saved object:

siem-detection-engine-rule-actions

You can see these saved object type through the query:

GET .kibana/_search
{
  "query": {
    "term": {
      "type": {
        "value": "siem-detection-engine-rule-actions"
      }
    }
  }
}

The contents of this saved object "side car" is near duplication of actions that should now be directly on the rules/alerts themselves and should no longer exist as a separate saved object:

{
  "siem-detection-engine-rule-actions" : {
     "ruleAlertId" : "fb1046a0-0452-11ec-9b15-d13d79d162f3", <--- Reference to SO id of rule "fb1046a0-0452-11ec-9b15-d13d79d162f3"
     "actions" : [
       {
         "action_type_id" : ".slack",
         "id" : "f6e64c00-0452-11ec-9b15-d13d79d162f3", <--- Reference to SO id of action "f6e64c00-0452-11ec-9b15-d13d79d162f3"
         "params" : {
           "message" : "Hourly\nRule {{context.rule.name}} generated {{state.signals_count}} alerts"
         },
         "group" : "default"
       }
     ],
     "ruleThrottle" : "1h",
     "alertThrottle" : "1h"
   },
   "type" : "siem-detection-engine-rule-actions",
   "references" : [ ],
   "migrationVersion" : {
     "siem-detection-engine-rule-actions" : "7.11.2"
   },
   "coreMigrationVersion" : "7.14.0",
   "updated_at" : "2021-08-26T21:56:30.233Z"
  }
}

As you can see we have two other saved object id's within our object which are:

"siem-detection-engine-rule-actions.ruleAlertId" : "fb1046a0-0452-11ec-9b15-d13d79d162f3"
"siem-detection-engine-rule-actions.actions.id" : "f6e64c00-0452-11ec-9b15-d13d79d162f3"

These are both saved object id's that are references to two different SO types of action and alert you can query against like so:

GET .kibana/_doc/action:21f1c6e0-09d5-11ec-908c-57a10fa3ee91

And then:

GET .kibana/_doc/alert:fb1046a0-0452-11ec-9b15-d13d79d162f3

Note: These do not have proper reference id's within them. This doesn't mean we can't add a saved object migration to move them before 7.16.0 or in conjunction with our plans which we might do depending on how we decide to migrate these. However, we have decided to do this migration before 7.16.0 in a small PR to eliminate this question or any issues around the SO id's being re-generated.

Second note is that we have this duplicated field:

"ruleThrottle" : "1h",
"alertThrottle" : "1h"

Which looking through the history seems to have been there for over a year and both are filled out.

What our goals are is to migrate all of the siem-detection-engine-rule-actions by:

Iterating through each each siem-detection-engine-rule-actions and moving their "actions" into the "alert" SO. Note we have to transform some data since the side car of "siem-detection-engine-rule-actions" has mismatched labels such as it using snake_case "action_type_id" vs. the alert using camel case.
Change the notifyWhen to be onThrottleInterval on the alert during migration
Change the throttle to be the value of ruleThrottle if available, otherwise alertThrottle, otherwise we can default to 1h to be safe.
Change muteAll to false
Delete the siem-detection-engine-rule-actions

This will then be a migration to where existing alerts/rules which had this "side car model" will now operate with the newer mechanisms and all the additional SO will be removed.

Approaches:
👎 1. If we do nothing then we have left over saved objects and the user will have to manually check all their notifications and re-adjust them. Users will have N number of left over saved objects and we run the risk of later migration issues down the road. We could optionally just delete these/remove them and do no migrations. Other risks is the user never notices that they suddenly do not have notifications if we just put in release notes they have to check for notifications that are unset during the upgrade.
🤷 2. We could write a REST interface which uses the alerting client API and put something such as a banner up and require users to click on the banner to migrate these by clicking on the button. Or we try to auto-migrate these. Risks are that the API keys will be changed out for the new user, or if they never visit the page then the risk is that they suddenly do not have notifications.
😁 3. We could at startup run a migration process either shortly after regular Kibana migration to do all the heavy lifting of manually migrating these with the same API key. Risks are that the code has to be maintained for a while and the alerting team could change their format of their API keys, etc... If we could have an API on startup to use with this approach that wouldn't change the API key that would be great and would mitigate this risk more. Draft PR for this is up as a proof of concept so far. Since this requires a "join" we cannot use the normal Kibana startup migration so far. Discussions are here, here, and here.

Conclusions and decisions:

So far for approach 1, no one has expressed they would like to do that approach. We do not have explicit stake holder/business/product telling us "no" on approach 1 but several people have voiced concerns over a do-nothing approach.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-09-15T17:52:13Z

Pinging @elastic/security-detections-response (Team:Detections and Resp)

stacey-gammon · 2021-09-16T14:31:04Z

If you go with 3, wouldn't you still have the possibility of losing notifications because you can't guarantee the migration is complete by the time the Kibana server is up and running? I think that option sounds risky because you'd be running migrations while users are allowed to interact with their alerts (if I am understanding correctly). Would there be a possibility of a race condition? For example, a user adds an action which gets put onto rule.actions but then the migration runs and overwrites it with siem-detection-engine-rule-actions.actions? Disclaimer: I'm still trying to understand how it all works so apologies if I am off base!

The Kibana App team ran into the same requirement and managed to work around it by leveraging ad-hoc migrations when loading and saving the saved objects. I wonder if something similar could be achieved? (cc @flash1293 if you have any words of advice for how that went!).

Update siem-detection-engine-rule-actions to correctly use the references object so the ids are correct after the saved object id migration.
Migrate/delete when loading and saving the rule. Use an additional parameter so you only need run the extra query and migration if needed.
Leverage telemetry to track how many siem-detection-engine-rule-actions objects are left. Once it gets below a certain threshold, go with 1.?

flash1293 · 2021-09-16T14:52:01Z

The way we solved this for Graph is doing the reference lookup on read in the UI - this works for workspaces because they are only ever consumed this way, but it's not nice for a bunch of reasons:

It's possible the ad-hoc migration doesn't ever run because nobody is opening a workspace in the UI (e.g. exporting would not be able to export related saved objects)
We can never remove the code from deep within the app (because it's always possible there's an unmigrated workspace SO lingering)

I just skimmed the problem statement above but as far as I understood the reference ids are in the saved object, they are just not correctly placed in the references array, right? If I understood https://www.elastic.co/guide/en/kibana/master/sharing-saved-objects.html correctly, 7.16 is the last change to correctly place the references, so a simple migration moving these reference ids from ruleAlertId to the references array should suffice.

mikecote · 2021-09-16T15:04:05Z

Update siem-detection-engine-rule-actions to correctly use the references object so the ids are correct after the saved object id migration.

++ without this, I don't think there's a way to guarantee siem-detection-engine-rule-actions.ruleAlertId and
siem-detection-engine-rule-actions.actions.id will point to the newly generated ID. An ad-hoc migration running after Kibana start could be migrating SOs that are already migrated to, say, 8.7 (long after the 8.0 migration generated new IDs).

stacey-gammon · 2021-09-16T15:20:58Z

Update siem-detection-engine-rule-actions to correctly use the references object so the ids are correct after the saved object id migration.

Yea, I think that only solves half the problem though. The ids will be correct, but they still won't be able to migrate the content of siem-detection-engine-rule-actions into the alerting rule Saved Object because it would require a look-up during Saved Object Migrations. My understanding is that they want something like this:

migrateRule(rule) {
  const sideCar = findSavedObjectsWithReference('siem-detection-engine-rule-actions', rule.id);
  rule.actions += sideCar.actions;
  deleteSavedObject(sideCar);
  return rule;
}

Or the other way:

migrateSideCar(sideCar) {
  const rule = loadSavedObject('rule', rule.id);
  rule.actions += sideCar.actions;
  deleteSavedObject(sideCar);
  return undefined; // ??
}

As far as I understood from Core, the SO migration system doesn't support this kind of lookup during migrations.

FrankHassanabad · 2021-09-16T15:22:42Z

I just skimmed the problem statement above but as far as I understood the reference ids are in the saved object, they are just not correctly placed in the references array, right?

That is being changed in a small upcoming PR to eliminate any questions or issues with any approach we choose and to make it so we don't have to carry baggage of doing saved object resolves at all.

Edited the ticket with these words to reflect this now:

Note: These do not have proper reference id's within them. This doesn't mean we can't add a saved object migration to move them before 7.16.0 or in conjunction with our plans which we might do depending on how we decide to migrate these. However, we have decided to do this migration before 7.16.0 in a small PR to eliminate this question or any issues around the SO id's being re-generated.

flash1293 · 2021-09-16T15:32:31Z

Ah, got it, thanks for the clarification. We discussed lookups during migration with the core team and decided against it for performance reasons.

Not super happy with the ad-hoc migration in there, but the plan with leveraging telemetry sounds good for phasing it out.

lukeelmers · 2021-09-16T16:27:22Z

We discussed lookups during migration with the core team and decided against it for performance reasons.

More context on that here: #34996 (comment)

Not super happy with the ad-hoc migration in there, but the plan with leveraging telemetry sounds good for phasing it out.

I agree that, assuming I understand everything correctly, this currently sounds like the most viable (and least risky) path forward.

rylnd · 2021-09-21T22:29:04Z

Following some offline meetings/discussion, we've come up with an option that should satisfy all parties for 7.16. Please review and let me know if you have questions/concerns:

The core ideas are to support both versions of actions, to funnel users through our own custom import/export APIs, and to check/migrate rules' actions when the rule is touched (create/import/save). There are a few pieces of work involved with this:

Reverting the removal of sidecar (read: old) actions and supporting both types
Adding code to perform the migration from sidecar actions to new actions on rule save
Enabling the export of all associated rule SOs in our export API:
- actions, connectors
- exception lists, exceptions
- notifications
Telemetry to report how many users still have sidecar actions

Users can then update their rules ad hoc by touching them, or they can update all rules at once by exporting and reimporting (via our custom import/export APIs). In 8.x, once we have confidence (via telemetry), we can remove support for sidecar actions and remove our own import/export in favor of the SOM. Until then, we will discourage/prevent Detections users from managing rules there.

FrankHassanabad added the discuss label Sep 15, 2021

FrankHassanabad self-assigned this Sep 15, 2021

botelastic bot added the needs-team Issues missing a team label label Sep 15, 2021

FrankHassanabad added the Team:Detections and Resp Security Detection Response Team label Sep 15, 2021

botelastic bot removed the needs-team Issues missing a team label label Sep 15, 2021

FrankHassanabad mentioned this issue Sep 15, 2021

[Security Solution][Detections] Migrate from shadow saved object for managing actions configured to fire at specific intervals #100958

Closed

oatkiller mentioned this issue Sep 16, 2021

[Draft for socialization at this point] Migration side car actions 2 #112043

Closed

9 tasks

spong mentioned this issue Sep 17, 2021

[Security Solution][Detections] Add ability to import/export Rule Actions #100956

Closed

flash1293 mentioned this issue Sep 20, 2021

Saved object migrations to allow pre-migration state collection #34996

Open

mikecote mentioned this issue Sep 22, 2021

[Alerting] Explanation and approach to removing the "siem.notification" saved object and rule type #112209

Closed

FrankHassanabad closed this as completed Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alerting] Explanation and approach to migrating and removing "siem-detection-engine-rule-actions" saved object #112327

[Alerting] Explanation and approach to migrating and removing "siem-detection-engine-rule-actions" saved object #112327

FrankHassanabad commented Sep 15, 2021 •

edited

Loading

elasticmachine commented Sep 15, 2021

stacey-gammon commented Sep 16, 2021

flash1293 commented Sep 16, 2021

mikecote commented Sep 16, 2021

stacey-gammon commented Sep 16, 2021

FrankHassanabad commented Sep 16, 2021 •

edited

Loading

flash1293 commented Sep 16, 2021

lukeelmers commented Sep 16, 2021

rylnd commented Sep 21, 2021

[Alerting] Explanation and approach to migrating and removing "siem-detection-engine-rule-actions" saved object #112327

[Alerting] Explanation and approach to migrating and removing "siem-detection-engine-rule-actions" saved object #112327

Comments

FrankHassanabad commented Sep 15, 2021 • edited Loading

elasticmachine commented Sep 15, 2021

stacey-gammon commented Sep 16, 2021

flash1293 commented Sep 16, 2021

mikecote commented Sep 16, 2021

stacey-gammon commented Sep 16, 2021

FrankHassanabad commented Sep 16, 2021 • edited Loading

flash1293 commented Sep 16, 2021

lukeelmers commented Sep 16, 2021

rylnd commented Sep 21, 2021

FrankHassanabad commented Sep 15, 2021 •

edited

Loading

FrankHassanabad commented Sep 16, 2021 •

edited

Loading