-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP POC Event categorization on Linux auth logs #9905
Conversation
Pinging @elastic/secops |
Based on the work in this proof of concept, we'll need a much more terse way to define these rules:
Some things that could help:
|
Thanks for putting this together @webmat.
Looking at the final pipeline, it looks reasonably short and actually quite readable. There are for sure ways to improve the developer experience, and we should brainstorm about them, but based on the above I wouldn't say this is unfeasible in the current form. Could you explain perhaps a bit more what you found to be terrible? Regarding the |
@tsg The reason I say it's not reasonable is that so far this PR so far is only doing categorization on 1/7 types of messages that I see in this module (there may be more than 7). Moreover, it's not doing so properly, as it incorrectly makes the assumption that anything SSH is about authentication attempts. But SSH messages can also contain admin stuff (restarts) and sessions (connects / disconnects). Given all this, I think the amount of code necessary is prohibitive *. Another part that bugs and makes things brittle me is how each processor has to replicate a full I'll work on the PR some more this week, to continue fleshing out the POC. I think it will better showcase the problems I see. Note that we could take a completely different approach as well, and go with one single bigger Painless script. Multiline, indentation, etc. But we don't have good support for this in Beats yet. * I did start this PR by doing a pretty good refactoring that extracts out the parsing of the syslog header. It used to be done by each grok pattern. Even if we end up dropping this POC, I'll extract the refactoring and submit it on its own. This refactoring is working against me making the case that this generates too much code, since I started with a code reduction ;-) |
When evaluating this code, make sure you scroll right. Check out line 71 ;-) |
Another note that's unrelated to Ingest Node verbosity is that we'll need to do de-duplication as well. In many cases, a single real world event (e.g. successful login) can translate to 3-4 log entries or more. See this successful login. We'll need to de-duplicate this, to reduce noise. This won't be done with Ingest Node, but in the current plan, I'm not sure what would do this. This comment is more of a note for later than a problem to be addressed in this POC, though. |
I think the real is issue elastic/elasticsearch#37120. I doubt (but haven't looked) that you actually have that 75 scripts/templates that need to compile. I will look into getting that issue resolved very soon. I would prefer to not raise the limit unless we know we the limit will be hit after elastic/elasticsearch#37120 is resolved. |
@jakelandis Ok, thanks for bringing this one up. We've bumped the Beats CI pipeline to 1000 compilations / minute for now 😆 I'll make sure nobody tries to "fix" any of the script and let them know about this issue |
Side effect is that now, all of these patterns now correctly populate `process.name`
- set event.kind (trivial) - set event.category to authentication for all sshd activity - move event.action to system.auth.ssh.action, to make room for normalized event.action
… ssh activity Note: when I say event.action is too broad, this is because it lumps in session disconnects as 'ssh_login', when it should likely be ssh_session.
9339a29
to
5677481
Compare
This PR adds the following fields for the SSH login events: * `event.category: authentication` * `event.action: ssh_login` * `event.type` either `authentication_success` or `authentication_failure` The `event.outcome` is currently not quite ECS compliant, but I didn't touch it to avoid a breaking change. The PR doesn't attempt to categorize other logs besides the SSH login attempts, so it's a subset of elastic#9905, but it's what we need for the UI.
* Adding categorization fields for the system/auth module This PR adds the following fields for the SSH login events: * `event.category: authentication` * `event.action: ssh_login` * `event.type` either `authentication_success` or `authentication_failure` The `event.outcome` is currently not quite ECS compliant, but I didn't touch it to avoid a breaking change. The PR doesn't attempt to categorize other logs besides the SSH login attempts, so it's a subset of #9905, but it's what we need for the UI. * Normalized event.outcome and brought back `system.auth.ssh.event`. * changelog
* Adding categorization fields for the system/auth module This PR adds the following fields for the SSH login events: * `event.category: authentication` * `event.action: ssh_login` * `event.type` either `authentication_success` or `authentication_failure` The `event.outcome` is currently not quite ECS compliant, but I didn't touch it to avoid a breaking change. The PR doesn't attempt to categorize other logs besides the SSH login attempts, so it's a subset of elastic#9905, but it's what we need for the UI. * Normalized event.outcome and brought back `system.auth.ssh.event`. * changelog (cherry picked from commit a9f567b)
…m/auth module (#11363) * Adding categorization fields for the system/auth module (#11334) * Adding categorization fields for the system/auth module This PR adds the following fields for the SSH login events: * `event.category: authentication` * `event.action: ssh_login` * `event.type` either `authentication_success` or `authentication_failure` The `event.outcome` is currently not quite ECS compliant, but I didn't touch it to avoid a breaking change. The PR doesn't attempt to categorize other logs besides the SSH login attempts, so it's a subset of #9905, but it's what we need for the UI. * Normalized event.outcome and brought back `system.auth.ssh.event`. * changelog (cherry picked from commit a9f567b) * cleanup changelog
To me, it's looking like the current Ingest Node processors make this task much too verbose.
General TODO
Categorization TODO
event.kind:event
event.category:authentication
,event.action:ssh_login
,event.outcome:(success|failure)
event.category:session
,event.action:ssh_session
,event.outcome:(connect|disconnect)
event.category:?
,event.action:?