-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auditbeat] User dataset: Numerous fixes to error handling #10942
Conversation
Pinging @elastic/secops |
} | ||
for _, user := range users { | ||
event := ms.userEvent(user, eventTypeState, eventActionExistingUser) | ||
event.RootFields.Put("event.id", stateID.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stateID will be an all zero uuid if uuid.NewV4()
failed. What's the impact of this on the metricset and the consumers of this data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When sending a list of all users, the event.id
is what they have in common, to make it easy for consumers to correlate them. If generating it failed, it would no longer be possible to use it for that purpose - lists of users sent from the same system at different times would have the same (zero) event.id. Instead, they would have to fall back on the less user-friendly timestamp.
It's unlikely to happen, but if it did the dataset would still send all the users, and an error as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything that needs to be put into the changelog? Or is this unreleased.
Good point, I've added an entry. |
…0942) Fixes a number of issues in the `user` dataset discovered while testing on Fedora 29: * When calling C functions via cgo, don't rely on errno as an indicator of success - check the return value first. * Set errno to 0 before calling getpwent(3) or getspent(3). * Make C function calls thread-safe. * Address a systemd bug where it can return ENOENT even when there is no error. * Don't fail on every error. * Fail system test on WARN/ERROR messages in the log. (cherry picked from commit 8ce5440)
…0942) Fixes a number of issues in the `user` dataset discovered while testing on Fedora 29: * When calling C functions via cgo, don't rely on errno as an indicator of success - check the return value first. * Set errno to 0 before calling getpwent(3) or getspent(3). * Make C function calls thread-safe. * Address a systemd bug where it can return ENOENT even when there is no error. * Don't fail on every error. * Fail system test on WARN/ERROR messages in the log. (cherry picked from commit 8ce5440)
…0942) Fixes a number of issues in the `user` dataset discovered while testing on Fedora 29: * When calling C functions via cgo, don't rely on errno as an indicator of success - check the return value first. * Set errno to 0 before calling getpwent(3) or getspent(3). * Make C function calls thread-safe. * Address a systemd bug where it can return ENOENT even when there is no error. * Don't fail on every error. * Fail system test on WARN/ERROR messages in the log. (cherry picked from commit 8ce5440)
…11190) Fixes a number of issues in the `user` dataset discovered while testing on Fedora 29: * When calling C functions via cgo, don't rely on errno as an indicator of success - check the return value first. * Set errno to 0 before calling getpwent(3) or getspent(3). * Make C function calls thread-safe. * Address a systemd bug where it can return ENOENT even when there is no error. * Don't fail on every error. * Fail system test on WARN/ERROR messages in the log. (cherry picked from commit 8ce5440)
…11189) Fixes a number of issues in the `user` dataset discovered while testing on Fedora 29: * When calling C functions via cgo, don't rely on errno as an indicator of success - check the return value first. * Set errno to 0 before calling getpwent(3) or getspent(3). * Make C function calls thread-safe. * Address a systemd bug where it can return ENOENT even when there is no error. * Don't fail on every error. * Fail system test on WARN/ERROR messages in the log. (cherry picked from commit 8ce5440)
When testing the
user
dataset on Fedora 29, I noticed it's not working. Digging into it, there were a few issues, all of which this PR fixes:errno
is going to be zero on success (see here for some detail). Instead, we should check the return value first, and if that indicates that there might have been an error (usually when it'sNULL/nil
) we should check the error return.If one wants to check errno after the call, it should be set to zero before the call.
errno
cannot be accessed directly in Go code, so we introduce a helper functionsetErrno
for that.getspent(3)
does not have the same warning, but given that at least glibc uses the same code path for both, it's reasonable to assume the same applies. So we seterrno
to0
before calling both functions.errno
is thread-local, and the function familiessetpwent/endpwent/getpwent
andsetspent/endspent/getspent
are not thread-safe, so we introduceruntime.LockOSThread()/UnlockOSThread()
to make sure all C functions are run on the same OS thread.getpwent()
should returnNULL
anderrno
should be zero when all entries have been returned. However, there is a bug in systemd that causes it to seterrno
toENOENT
even when there is no error (nss-systemd's getpwent sets errno to ENOENT systemd/systemd#9585). This bug affects at least Fedora 29. Following this changeENOENT
is treated as if there is no error. It's not supposed to be a valid error value forgetpwent()
anyway, so should not happen anytime outside this bug.user
dataset to return no users, even though all users had been read successfully. This PR changes to gathering errors in multierrors and returning them alongside whatever data could be read. This is in line with wanting to make the System module more resilient to non-fatal failures during data collection (better to return some things alongside an error than no things at all).WARN/ERROR
messages in the log. This change can probably also be made for the other datasets.Writing up this PR description I'm realizing this combines quite a few issues. I could split it into multiple PRs if anybody wants, though most would have only a few lines of changed code. Because some indentation changed it's best to check Github's
No Whitespace
button.