-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fileset for parsing linux auditd logs #3750
Conversation
3e91ac1
to
8794516
Compare
8794516
to
a5ef7ae
Compare
fcfg, err = applyOverrides(fcfg, mcfg.Module, filesetName, overrides) | ||
if err != nil { | ||
return nil, fmt.Errorf("Error applying overrides on fileset %s/%s: %v", mcfg.Module, filesetName, err) | ||
} | ||
|
||
if fcfg.Enabled != nil && (*fcfg.Enabled) == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason this was moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies the overrides first, which allows an override to disable a fileset. It allows a CLI config like:
-M system.audit.enabled=false
{ | ||
"script": { | ||
"lang": "painless", | ||
"inline": " String trimQuotes(def v) {\n if (v.startsWith(\"'\") || v.startsWith('\"')) {\n v = v.substring(1, v.length());\n }\n if (v.endsWith(\"'\") || v.endsWith('\"')) {\n v = v.substring(0, v.length()-1);\n } \n return v;\n }\n \n boolean isHexAscii(String v) {\n def len = v.length();\n if (len == 0 || len % 2 != 0) {\n return false; \n }\n \n for (int i = 0 ; i < len ; i++) {\n if (Character.digit(v.charAt(i), 16) == -1) {\n return false;\n }\n }\n\n return true;\n }\n \n String convertHexToString(String hex) {\n\t StringBuilder sb = new StringBuilder();\n\n for (int i=0; i < hex.length() - 1; i+=2) {\n String output = hex.substring(i, (i + 2));\n int decimal = Integer.parseInt(output, 16);\n sb.append((char)decimal);\n }\n\n return sb.toString();\n }\n \n def possibleHexKeys = ['exe', 'cmd'];\n \n def audit = ctx.system.get(\"audit\");\n Iterator entries = audit.entrySet().iterator();\n while (entries.hasNext()) {\n def e = entries.next();\n def k = e.getKey();\n def v = e.getValue(); \n\n // Remove entries whose value is ?\n if (v == \"?\") {\n entries.remove();\n continue;\n }\n \n // Convert hex values to ASCII.\n if (possibleHexKeys.contains(k) && isHexAscii(v)) {\n v = convertHexToString(v);\n audit.put(k, v);\n }\n \n // Trim quotes.\n if (v instanceof String) {\n v = trimQuotes(v);\n audit.put(k, v);\n }\n \n // Convert arch.\n if (k == \"arch\" && v == \"c000003e\") {\n audit.put(k, \"x86_64\");\n }\n }\n \n // Fix inner msg parsing.\n def msg = audit.get(\"msg\");\n if (msg != null) {\n def i = msg.indexOf(\"=\");\n if (i >= 0) {\n def newKey = trimQuotes(msg.substring(0, i));\n def newValue = trimQuotes(msg.substring(i+1, msg.length()));\n audit.remove(\"msg\");\n audit.put(newKey, newValue);\n }\n }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is quite a script. I'm curious if that has a major impact on ingestion speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't measured in any way the impact on ingestion rates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove a small part of the script by changing the grok pattern (just pushed it), but I think it is as simple as I can make it.
The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data. Features - Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow. - Applies geoip lookup to the `system.audit.addr` field which is present for some remote login events. - Decodes hex encoded ascii values that are sometimes used for the `system.audit.exe` and `system.audit.cmd` fields. - Remove key/value pairs where the value is `?`. Missing Features - Decoder for `system.audit.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value: `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
Disable other filesets in a module while testing.
a5ef7ae
to
c945fb7
Compare
LGTM. Waiting for @tsg to have a look. |
assert "error" not in obj | ||
|
||
if module != "system" and fileset != "audit": | ||
# There are dynamic fields in audit logs that are not documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the dynamic fields grouped under a sub-directory? Then we could document it as a dict
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the fields are written under system.audit.*
with no inner dictionaries. Is there anything I should do in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible / make sense to group them under a sub-dict? I think we don't have a precedent in any Beat so far that dynamic and static fields are mixed together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but I don't think using a sub-dict in this case provides any additional context to the user about the fields. We would be using sub-dict here due to the implementation details (like the fact that it uses a kv processor instead of a specific grok pattern for each audit record type). From a user perspective I don't think a sub-dict would add value (at least not with what I'm thinking -- system.audit.kv.*
), but I don't like to be the one breaking precedent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I worry is that a dynamic field could overwrite a static field and conflict on the type. Could we perhaps just have one entry with all the fields as a string? Or use the raw
namespace? Also not sure yet how to deal with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there was an exclude_keys
option in the KV processor then it would easy to prevent the dynamic fields from overwriting the well-known fields from the header. The Logstash KV filter has this option so I'll ask about getting this in ES. I could enhance this module to use that once it's available.
There are only a few keys in the audit header that could be overwritten (record_type
and sequence
) by KV data and I haven't observed these being used in any audit events. I think the risk is minimal.
module, obj["fileset"]["module"]) | ||
|
||
if not (module == "mysql" and fileset == "slowlog") and not (module == "system" and fileset == "auth"): | ||
# TODO: There are errors parsing the test logs from these modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not ideal, but I don't have an easy solution for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, definitely not idea, but I really wanted to enable error
checking. Maybe as part of the manifest or some file in the test directory the fileset could self-describe its testing parameters (e.g. allow_errors: true
, has_dynamic_fields: true
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we probably need something like that. I'm good with the current solution for the time being.
=== Compatibility | ||
|
||
This module was tested with logs from OSes like Ubuntu 12.04, Centos 7, macOS Sierra, and others. | ||
This module was tested with logs from OSes like Ubuntu 12.04, Centos 7, macOS | ||
Sierra, and others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "and others" might come back to us as an issue in support, maybe it's better to leave it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, probably a good idea to drop that. Will do.
"field": "system.audit.sub_kv", | ||
"ignore_failure": true | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other filesets so far we remove the original message key, to avoid duplication. For consistency, we might want to do it here as well, unless there's a lot of value in keeping it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason I like having the message field is because it gets analyzed and make searching simpler when I don't know exactly what I'm looking for (helps with _all
being deprecated). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was always on the fence (for each new module) about keeping the message
in, but thought I'd try it without it first, since it's easier to add than remove. My understanding is that by default Kibana will still search in all fields, even without having an explicit _all
.
What you think if we leave it out here again for consistency, but we reconsider before 6.0 if we want to add it everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to not have the raw message in by default. But it can become useful in some case, for example for debugging. We could introduce a config option which can be set per module to turn it on. Problem is that this would mean a change in the processor which should not change based on config options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the message
field in ececd94.
}, | ||
{ | ||
"convert": { | ||
"field" : "system.audit.item", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, are these conversions needed? In the other places I have assumed that if we set it as a number in the mapping template, that's enough for ES to do the conversion itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I don't think these are strictly necessary. ES should do the conversion.
The only difference that I can possibly think of here is how errors would be handled. If the conversion fails in the ingest processor the event is still written to ES, but has an error
field and is not completely enriched. If a failure occurs during automatic conversion in ES then you get an mapping exception and the whole document gets dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I agree it's worth keeping them in this case.
"AUDIT_KEY_VALUES": "%{WORD}=%{GREEDYDATA}" | ||
}, | ||
"patterns": [ | ||
"%{AUDIT_PREFIX} %{AUDIT_KEY_VALUES:system.audit.kv} old auid=%{NUMBER:system.audit.old_auid} new auid=%{NUMBER:system.audit.new_auid} old ses=%{NUMBER:system.audit.old_ses} new ses=%{NUMBER:system.audit.new_ses}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth documenting old_auid
, old_ses
, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 7cd7d52
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, sorry for the late review. I left a few comments. This would also be worth a quick blog post like the one I did on the auth logs.
- Renamed system.audit.type to system.audit.record_type - Renamed system.audit.counter to system.audit.sequence - Dropped system.audit.epoch after parsing it into @timestamp - Dropped message field. - Removed “and others” from the supported OSes listed in docs. - Updated dashboards after renaming and deleting fields. - Updated script to drop fields where value was `(null)`.
… (elastic#3941) (elastic#3962) The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data. Features - Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow. - Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events. - Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields. - Remove key/value pairs where the value is `?`. Missing Features - Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value: `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
) (#3975) The PR adds an auditd module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data. Features - Parses audit event type, unix epoch time, audit event counter, and the arbitrary key/value pairs that follow. - Applies geoip lookup to the `auditd.log.addr` field which is present for some remote login events. - Decodes hex encoded ascii values that are sometimes used for the `auditd.log.exe` and `auditd.log.cmd` fields. - Remove key/value pairs where the value is `?`. Missing Features - Decoder for `auditd.log.saddr` field present in SOCKADDR audit events. A recipe is described [here](https://unix.stackexchange.com/questions/102926/how-to-interpret-the-saddr-field-of-an-audit-log) and could probably be ported to painless. Sample value: `type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F`
Update: Module renamed in #3941.
The PR adds and audit fileset to the system module for parsing logs from the auditd daemon. There is a sample dashboard for viewing this data.
Features
system.audit.addr
field which is present for some remote login events.system.audit.exe
andsystem.audit.cmd
fields.?
.Missing Features
system.audit.saddr
field present in SOCKADDR audit events. A recipe is described here and could probably be ported to painless. Sample value:type=SOCKADDR msg=audit(1481078693.491:873): saddr=01002F7661722F72756E2F6E7363642F736F636B657400008983B330BE7F0000000000000000000070830130BE7F000004000000000000001B00000000000000D128B7ED000000008B8BB330BE7F00003B00000000000000403E4BF7FD7F0000447F0130BE7F000080150230BE7F