-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SECURITY SOLUTIONS][Alerts Actions] Fix migration from 7.11.0/7.11.1 to 7.12 #94722
Conversation
Pinging @elastic/security-threat-hunting (Team:Threat Hunting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach seems right to me - I was thinking of the same kind of solution.
I noted that we should add a test for alerts migrated in 7.11.2 with the now empty incident
object, since I think we will now have these appearing in the field.
Also, assuming we do have incident: {}
actions in the field, these seem like they will be busted - they won't validate so can't be executed. Should we do some kind of 7.12.0 migration to make them executable again? Eg, add the required summary
property (for Jira anyway) that would include a messaging that this action had previously been broken but is now enabled, has been recovered, but will need to be edited again. Maybe we don't need that? How obvious is it that the action is busted? I don't believe we provide any kind of feedback on whether an existing alert validates, and the only logging you would see would be an error message in the Kibana log about it failing validation.
@@ -571,6 +571,43 @@ describe('7.11.2', () => { | |||
} as SavedObjectUnsanitizedDoc<RawAlert>; | |||
expect(isAnyActionSupportIncidents(doc)).toBe(false); | |||
}); | |||
|
|||
test('it does not transforms alerts when the right structure connectors is already applied', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a customer upgrades to 7.11.2 with the current migration, they'll end up with some actions in alerts where incident
is {}
, right? Given the current schema, these won't validate when they're used, so should not be executed, but would presumably be candidates somehow fixing - if possible - in another migration (to 7.12.0?)?
In any case, whether we try to fix these or not, seems like we should add a test with an alert with a Jira/ServiceNow/IBM resilient action, where the incident
is {}
, since it seems like we'll have these in the field, in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, what you are saying? Yes I will do that
I think there is two branch logic here:
- if a user went from 7.10.x to 7.11.0 -> they lost all their data in the connector that was the first bug that we tried to fix because we forget to write a migration and user will have to write them from scratch
I think that we should be able to write them a script really familiar that what we did to bring everything back to normal in this situation too - if a user went from 7.11.0/1 to 7.11.2 -> they will lost again their connector at this time they have two solution using the script from the SDH or re-creating them from scratch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm a little curious how well these incident: {}
actions will survive the Kibana UI experience. I'm pretty sure on the back-end we'll fail fast because we are always validating these before we use them productively. But wondering if you edit an alert using one of these actions - what happens? Scared of seeing some kind of null dereference somewhere, I guess ...
I think that would be the decision point as to whether we need to do something for these in a 7.12.0 migration. Like add a placeholder summary
or other required fields in the incident
, to keep other things from breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I tried it out to edit an alert when you have this incident: {}
and we come back to normal. We know it is also working because our user who creates the SDH were able to get back to normal after editing all their alerts manually.
And yes when you have incident:{}
like in the SDH, we know that the alert get trigger but we never create an action in the associated connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this bug, but just want to chime in and agree with @pmuellr . It doesn't usually make sense to fix a bug in an already released migration function (unless it's causing data loss) because that buggy code has already run on users's data. A 7.11.2 -> 7.12 upgrade won't get the bug fix applied to the data. So there needs to be a 7.12 migration to guarantee that all the upgrade paths get this bug fix applied.
I think it makes sense to keep the buggy 7.11.2 migration function just like it was (maybe a comment stating this code is buggy), this way the code makes it obvious that there are kibana 7.11.2 instances with buggy data. The 7.12 migration could then fix the bug with a comment "fixing our 7.11.2 migration bug"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rudolf we ask user to not migrate from 7.11.0/1 to 7.11.2 because if you do then user will loose the data actions for good. We will have now way to bring it back through migration and you will have to run this script to get it back. So we want to stop the bleeding by changing the 7.11.2 migration so user can move forward from 7.11.0/1 to 7.12.x. And of course let them migrate from earlier version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, if this was a data loss bug then yes, all we can do is fix the data loss bug in the 7.11.2 migration, it's too late for users who already upgraded...
hmm this PR isn't adding a new migration. So say a user upgraded from 7.11.0 to 7.11.2, the alerts will be migrated in 7.11.2 and now have an empty We can add the test but I don't think this particular migration would get run against those alerts. Maybe it makes sense to add a test to ensure that the migration isn't run when it is tagged with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
… to 7.12 (elastic#94722) * do not modify connector with the right structure * review trying to bring back incident to live when we can * manage custom action * fix cypress test Co-authored-by: Kibana Machine <[email protected]>
… to 7.12 (elastic#94722) * do not modify connector with the right structure * review trying to bring back incident to live when we can * manage custom action * fix cypress test Co-authored-by: Kibana Machine <[email protected]>
… to 7.12 (#94722) (#94886) * do not modify connector with the right structure * review trying to bring back incident to live when we can * manage custom action * fix cypress test Co-authored-by: Kibana Machine <[email protected]> Co-authored-by: Xavier Mouligneau <[email protected]>
Summary
We got this SDH https://github.com/elastic/sdh-kibana/issues/1320, where jira connector failed the migration between 7.11.0 to 7.11.2. We know why this is happening because when we fixed our last blocker for 7.11.2, we did not tag the migration to the right version. We should have used 7.11.0 and not 7.11.2 or we should have added more logic to avoid this problem. Anyway, we figured out a work around for this user https://github.com/elastic/sdh-kibana/issues/1320#issuecomment-799771073.
We are fixing the migration script for 7.12
Checklist