-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High: cumulative patchset to fix CVE-2019-3885, CVE-2018-16877, CVE-2018-16878 + additional unmasked null pointer deref #1749
Conversation
This could possibly lead to unsolicited information disclosure by the means of standard output of the immediately preceding agent/resource execution leaking into the log stream under some circumstances. It was hence assigned CVE-2019-3885. The provoked pathological state of pacemaker-execd daemon progresses towards crashing it for hitting segmentation fault.
[0/4: make crm_pid_active more precise as to when detections fail] It would be bad if the function claimed the process is not active when the only obstacle in the detection process was that none of the detection methods worked for a plain lack of permissions to apply them. Also, do some other minor cleanup of the function and add its documentation. As an additional measure, log spamming is kept at minimum for repeated queries about the same PID.
[1/4: new helpers to allow IPC client side to authenticate the server] The title problem here could possibly lead to local privilege escalation up to the root's level (and implicitly unguarded by some additional protection layers like SELinux unless the defaults constrained further). Main problem is that the authenticity assumptions were built on two, seemingly mutually supporting legs leading to two CVEs assigned: * procfs (mere process existence and right path to binary check) used to verify (this part was assigned CVE-2018-16878), and * one-way only client-server authentication, putting the client here at the mercy of the server not necessarily cross-verified per the above point if at all (this part was assigned CVE-2018-16877) whereas these two were in fact orthogonal, tearing security assumptions about the "passive influencers" in the pacemaker's daemon resilience-friendly constellation (orchestrated by the main of them, pacemakerd) apart. Moreover, procfs-based approach is discouraged for other reasons. The layout of the basic fix is as follows: * 1/4: new helpers to allow IPC client side to authenticate the server (this commit, along with unifying subsequent solution for both CVEs) * 2/4: pacemakerd to trust pre-existing processes via new checks instead (along with unifying solution for both CVEs) * 3/4: other daemons to authenticate IPC servers of fellow processes (along with addressing CVE-2018-16877 alone, for parts of pacemaker not covered earlier) * 4/4: CPG users to be careful about now-more-probable rival processes (this is merely to mitigate corner case fallout from the new approaches taken to face CVE-2018-16878 in particular; courtesy of Yan Gao of SUSE for the reporting this) With "basic", it is meant that it constitutes a self-contained best effort solution with some compromises that can only be overcome with the assistance of IPC library, libqb, as is also elaborated in messages of remaining "fix" commits. Beside that, also conventional encapsulation of server-by-client authentication would be useful, but lack thereof is not an obstacle (more so should there by any security related neglectations on the IPC client side and its connection initiating arrangement within libqb that has a potential to strike as early as when the authenticity of the server side is yet to be examined). One extra kludge that's introduced for FreeBSD lacking Unix socket to remote peer PID mapping is masquerading such an unspecified PID with value of 1, since that shall always be around as "init" task and, deferring to proof by contradiction, cannot be pacemakerd-spawned child either even if PID 1 was pacemakerd (and running such a child alone is rather nonsensical). The code making decisions based on that value must acknowledge this craze and refrain from killing/signalling the underlying process on this platform (but shall in general follow the same elsewhere, keep in mind systemd socket-based activation for instance, which would end up in such a situation easily!).
[2/4: pacemakerd to trust pre-existing processes via new checks instead] In pacemakerd in the context of entrusting pre-existing processes, we now resort to procfs-based solution only in boundary, fouled cases, and primarily examine the credentials of the processes already occupying known IPC end-points before adopting them. The commit applies the new helpers from 1/1 so as to close the both related sensitive problems, CVE-2018-16877 and CVE-2018-16878, in a unified manner, this time limited to the main daemon of pacemaker (pacemakerd). To be noted that it is clearly not 100% for this purpose for still allowing for TOCTTOU, but that's what commit (3/3) is meant to solve for the most part, plus there may be optimizations solving this concern as a side effect, but that requires an active assistance on the libqb side (ClusterLabs/libqb#325) since any improvement on pacemaker side in isolation would be very cumbersome if generally possible at all, but either way means a new, soft compatibility encumberment. As a follow-up to what was put in preceding 1/3 commit, PID of 1 tracked as child's identification on FreeBSD (or when socket-based activation is used with systemd) is treated specially, incl. special precaution with child's PID discovered as 1 elsewhere. v2: courtesy of Yan Gao of SUSE for early discovery and report for what's primarily solved with 4/4 commit, in extension, child daemons in the initialization phase coinciding with IPC-feasibility based process scan in pacemakerd in a way that those are missed (although they are to come up fully just moments later only to interfere with naturally spawned ones) are now considered so that if any native children later fail for said clash, the pre-existing counterpart may get adopted instead of ending up with repeated spawn-bury loop ad nauseam without real progress (note that PCMK_fail_fast=true could possibly help, but that's rather a big hammer not suitable for all the use cases, not the ones we try to deal with gracefully here)
[3/4: other daemons to authenticate IPC servers of fellow processes] Now that CVE-2018-16877 issue alone is still only partially covered based on the preceding commits in the set, put the server-by-client authentication (enabled and 1/3 and partially sported in 2/3) into practice widely amongst the communicating pacemaker child daemons and towards CPG API provided by 3rd party but principally using the same underlying IPC mechanism facilitated by libqb, and consequently close the remaining "big gap". As a small justification to introducing yet another "return value" int variable, type-correctness is restored for those that shall be cs_error_t to begin with.
[4/4: CPG users to be careful about now-more-probable rival processes] In essence, this comes down to pacemaker confusing at-node CPG members with effectively the only plausible to co-exist at particular node, which doesn't hold and asks for a wider reconciliation of this reality-check. However, in practical terms, since there are two factors lowering the priority of doing so: 1/ possibly the only non-self-inflicted scenario is either that some of the cluster stack processes fail -- this the problem that shall rather be deferred to arranged node disarming/fencing to stay on the safe side with 100% certainty, at the cost of possibly long-lasting failover process at other nodes (for other possibility, someone running some of these by accident so they effectively become rival processes, it's like getting hands cut when playing with a lawnmower in an unintended way) 2/ for state tracking of the peer nodes, it may possibly cause troubles in case the process observed as left wasn't the last for the particular node, even if presumably just temporary, since the situation may eventually resolve with imposed serialization of the rival processes via API end-point singleton restriction (this is also the most likely cause of why such non-final leave gets observed in the first place), except in one case -- the legitimate API end-point carrier won't possibly acknowledged as returned by its peers, at least not immediately, unless it tries to join anew, which verges on undefined behaviour (at least per corosync documentation) we make do just with a light code change so as to * limit 1/ some more with in-daemon self-check for pre-existing end-point existence (this is to complement the checks already made in the parent daemon prior to spawning new instances, only some moments later; note that we don't have any lock file etc. mechanisms to prevent parallel runs of the same daemons, and people could run these on their own deliberation), and to * guard against the interferences from the rivals at the same node per 2/ with ignoring their non-final leave messages altogether. Note that CPG at this point is already expected to be authenticity-safe. Regarding now-more-probable part, we actually traded the inherently racy procfs scanning for something (exactly that singleton mentioned above) rather firm (and unfakeable), but we admittedly got lost track of processes that are after CPG membership (that is, another form of a shared state) prior to (or in non-deterministic order allowing for the same) carring about publishing the end-point. Big thanks is owed to Yan Gao of SUSE, for early discovery and reporting this discrepancy arising from the earlier commits in the set.
This is now more likely triggerable once the problems related to CVE-2018-16878 are avoided.
Oops, Travis is failing in part because now we require resolution of For illustration:
|
Last patch is a tentative attempt to calm Travis down. |
b3edc9c
to
4a904d4
Compare
No longer tentative, it apparently helped. |
Nice work! Thanks. |
Just to rectify
and (very contrieved scenario:
More realistic scenario is that the admin decided for some reason to All in all, that commit is also a very desirable, even if it rather |
Does these CVE issues CVE-2019-3885, CVE-2018-16877, CVE-2018-16878 exist in pacemaker 1.1.19? Thanks, |
@code-share1 (You would then know the answer to your question is: yes) [*] #1750 (comment) |
For a complete list of commits related to the security issues, see: |
See also:
https://www.openwall.com/lists/oss-security/2019/04/17/1
People are strongly suggested to get these patches in.