-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auditbeat Host] Add host, packages, and processes metricsets #8436
[Auditbeat Host] Add host, packages, and processes metricsets #8436
Conversation
Implements simple, initial versions of what a host, packages, and processes metricsets could look like.
This going into the feature branch the code is not terribly well tested (esp. on systems other than Mac), but there are system tests for each metricset. I'm a bit unhappy about having copied the whole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any other reviewers, here's the steps I used to make this code work. Not sure if this is obvious to everyone:
- Run
make
inx-pack/auditbeat
as well. You'll be running the binary created in this dir, not the one atauditbeat/auditbeat
. - Config file:
# Optional. To use the same data & log dir when running auditbeat from the x-pack directory
path.home: (your home path)/go/src/github.com/elastic/beats/auditbeat
auditbeat.modules:
- module: system
metricsets:
- host
- packages
- processes
- Load the index template
./auditbeat setup --template -c ../../auditbeat/auditbeat.dev.yml
- The binary must be run as root, so you'll need to set your dev config file owner to root
sudo ./auditbeat -c ../../auditbeat/auditbeat.dev.yml -e
My review
A few points that are not about the data
- Perhaps we should remove the local addresses like 127.0.0.1 and ::1? I don't think they'll be useful.
- Your fields for the index template are not nested under
system.
, so they don't apply to the data you're sending (e.g. data sent:system.host.ip
, template defineshost.ip
).- When they are nested under
system.
, I suspect that the IP addresses will need to be truncated not to include the subnet size. So you'd want to send127.0.0.1
without the /8.
- When they are nested under
- I would break up the initial listings of packages and processes to be one event per package/process. We can do very little with objects nested in arrays, with ElasticSearch or Kibana.
- So perhaps single package events for these full package inventories could have
status:present
instead ofstatus:installed
. Same for processes:system.processes.status:already running
?
- So perhaps single package events for these full package inventories could have
I've also commented the code here and there. But of course not being a proficient Go & Python dev, I can comment on very little at this time. Others will have to chime in for that :-)
One final detail: if building under x-pack remains a thing, we should add a gitignore there for "auditbeat", "logs" and "data" as well :-)
Good stuff overall!
|
||
if not hasattr(self, 'beat_path'): | ||
self.beat_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why is this metricbeat test being modified? libbeat/tests/system/beat/beat.py
doesn't appear to have been modified in this PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Auditbeat with X-Pack is in its own xpack/auditbeat
directory, beat_path
needs to point to that. It was being overwritten since AuditbeatXPackTest
in auditbeat_xpack.py
ultimately extends BaseTest
from metricbeat.py
.
missing = append(missing, cacheValue) | ||
delete(cache.hashMap, cacheKey) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You appear to be doing an O(n^2) operation here (n before and after is roughly equal).
Since the new state (current
) is all you need to keep in memory for the next tick, perhaps you can skip the updating of the existing hash map and save current
as is for later. In other words, this isn't really a cache, but more of a "previous state".
Then you can find "new" and "missing" result sets by:
- looping over all keys from
current
and searching thecache
for each item based on key. Each "not found" is added to your "new" result set - looping over all keys from
cache
and searching thecurrent
(which would have to be a map as well, not a slice). Each "not found" is added to your "missing" result set.
Each of these loops is an O(n), and so is the work for building a map out of current
.
Note that you can check for presence by assigning to a second, optional var when searching the map:
_, found := cache[key]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
current
is not a map, so I would need to build a new map and add all current
items and their hashes. Doing that every time we check the cache (e.g. every second) seems pretty expensive - most of the time nothing will have changed, or only one or two items. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to loop back here. I won't push too hard on the O(n^2), since I'm a total Go noob, so I may be missing something in your code or in the Map initialization.
But my computer currently runs 500-ish processes. This means each time it's comparing the new list of processes with the last state, it's doing a multiple of 250 000 operations (500^2) instead of doing a multiple of 500 operations... This may be irrelevant, since typically servers run way less stuff than workstations.
So I just wanted to put that out there, but we can keep things as they are for now. This may be premature optimization.
|
||
if config.ReportChanges { | ||
// TODO: Implement reporting changes? | ||
ms.log.Warnw("Metricset %v/%v does not support report_changes", moduleName, metricsetName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to interpolate correctly for me. Here's what I get in the logs:
2018-09-27T14:15:10.957-0400 WARN [system] host/host.go:52 Metricset %v/%v does not support report_changes {"system": "host"}
"package.summary": pkg.Summary, | ||
"package.url": pkg.URL, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how much more information we're getting here over osquery :-D Thinking of "InstallTime" and "Summary" particularly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's off to a good start. I only looked at the Auditbeat code in this pass. The main issues are
- Align field names to ECS.
- Figure out a secomp solution for running
rpm
.
|
||
if config.ReportChanges { | ||
// TODO: Implement reporting changes? | ||
ms.log.Warnw("Metricset %v/%v does not support report_changes", moduleName, metricsetName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using <metricset>.report_changes
so this can be controlled independently for each metricset within the module config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, and I see the cpu
and core
metricsets of the system
module in Metricbeat do that as well. I wonder if the configuration is counterintuitive - I myself tried at first to nest it under the metricset i.e.:
- module: system
metricsets:
- processes
report_changes: true
Should we consider allowing something like this in the future, rather than have it apart?
For now, I'll change it to be processes.report_changes
and packages.report_changes
.
*/ | ||
func listRPMPackages() ([]cache.Cacheable, error) { | ||
format := "%{NAME}|%{VERSION}|%{RELEASE}|%{ARCH}|%{LICENSE}|%{INSTALLTIME}|%{SIZE}|%{URL}|%{SUMMARY}\\n" | ||
out, err := exec.Command("/usr/bin/rpm", "--qf", format, "-qa").Output() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work on Linux unless something is changed w.r.t. the seccomp policy or config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, true. Do you think it makes sense to allow Auditbeat to execute /usr/bin/rpm
, or change to reading the RPM db files in /var/lib/rpm/*
with the help of a library instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could directly read the database file or use librpm's database api and statically link it with Auditbeat that would be the ideal solution in my opinion because we don't have to modify Auditbeat's security posture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. There are some Go libraries for RPM and BerkeleyDB out there that we could use as well. I'd like to do that in a separate PR though, so this one can get some closure. I'll remove the RPM code for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this can it can be handled at a later date. For now you could inject config to disable seccomp for the system test case on Linux that executes rpm
.
So... I think this is ready for another review. I hope I've addressed any concerns and not missed anything - apologies if I did (and please point it out). Follow-up 1: RPM supportRPM will be implemented in a separate PR. We'll have to figure out how to read the rpmdb - there are some Go libraries for RPM and for BerkeleyDB (which rpmdb is using) and the official rpm C library we could use. Follow-up 2: TemplatesAt the moment, we cannot generate Elasticsearch templates from two directories ( SchemaI hope the fields are mostly ECS-compliant, though many are not in ECS. The current schemas are:
One vs. many documentsWhen reporting a snapshot of the state (host info, list of currently running processes, list of installed packages) we could send one document containing all items (the code does this now) or one document for each process/package. I like sending one document, because it makes it easy to know what the system's current status is. You just has to find the most recent snapshot and then replay any events from there. If the snapshot is in multiple documents, it's hard to get them all together - you have to know or guess how many there are and query until you have all of them, and you would need some kind of correlation field (e.g. timestamp, UUID) to know which belong to the same snapshot. I also don't see what one would do with multiple documents - it's state information, so I don't think Kibana can visualize it in a meaningful time series way. |
@cwurm I agree that if we have all processes / packages (even in full checkins) we then need a way to make correlation easier. In other words, figure out what are all the docs related to a given checkin. I'd like to experiment further with the data set. You may be right that all the data in one array may be best at this time. I initially pushed for one per document in order to get the full power of aggregations & search. However since there will be multiple indices at play and subtle links between them (e.g. one checkin reports many packages), the direct querying by users will not be as simple as it usually is for monitoring indices anyway. In other words, I'm now realizing that making the data model easy to query by end users should perhaps not be our goal at this time. So let's leave it as is and learn more about how we'll need to query the data before making this change. 👍🏼 Is there a way to force a full process & package checkin? Right now I'm only getting start/stops on the processes. A suggestion, as well: on host checkins, please save all IPs as an array on the ECS field I haven't looked at the code again yet. I'll try to do this a bit tomorrow. |
Everytime Auditbeat starts it should output a full list of currently running processes. Following that it will report only started/stopped ones if In full debug mode (started with
What do you see?
👍 |
Gotcha. Yes, I do see the full package listing right when starting auditbeat. We'd need a deterministic way to differentiate a full listing vs an update notification. Perhaps use |
@@ -0,0 +1,8 @@ | |||
The System `packages` metricset provides ... TODO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include the information of what packaging systems are currently supported? I think dpkg/deb and Homebrew, right? And probably the next line with the OS support is superfluous.
} | ||
|
||
func listDebPackages() ([]*Package, error) { | ||
const statusFile = "/var/lib/dpkg/status" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave it for a follow up PR, but I think we should test this code by injecting the status file location and providing a sample.
} | ||
|
||
func listBrewPackages() ([]*Package, error) { | ||
const cellarPath = "/usr/local/Cellar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, we should make sure we follow up with tests.
package cache | ||
|
||
// Cache is just a map being used as a cache. | ||
type Cache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the Cache doesn't need a mutex? Is it always called only from a single go-routine? If yes, then it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetch
is always called from the same goroutine. But I don’t see much of a downside to make it thread safe now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some relatively minor comments, which can be handled in follow ups.
Merged into feature branch. Follow up actions (feel free to add if I forgot something)
|
@cwurm Looks good, should we make an Auditbeat system module meta ticket with that checklist? |
Adds host, packages, and processes metricsets to Auditbeat. Host collects general host information, e.g. boottime, timezone, OS, network interfaces. Packages collects information about installed packages. For now, it supports debian and homebrew on darwin. Processes collects information about currently running, started, and stopped processes.
This adds three new metricsets to the new
system
module:host
collects general information about the system.packages
collects information about the installed packages and can detect changes.processes
collects all currently running processes and can detect changes.