Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix journald logging via "log stdout" #17775

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gromit1811
Copy link
Contributor

I tried to configure logging to systemd's journald using the method described in the documentation: Simply log stdout in frr.conf. But besides log messages from frrinit.sh and watchfrr, I saw no log messages from the actual routing daemons.

Debugging this revealed that FRR correctly notices that stdout is connected to journald (sd_stdout_is_journal==true), causing log_vty_init() to call zlog_5424_init() and zlog_5424_apply_dst(). But by the time zlog_5424_apply_dst() gets called, zt_stdout_journald.prio_min still is ZLOG_DISABLED, causing it to skip the call to zlog_5424_open(). And when frr.conf gets read later on, my log stdout statement sets .prio_min to LOG_DEBUG and causes a call to log_stdout_apply_level() and zlog_5424_apply_meta(), but the latter doesn't reattempt the zlog_5424_open(), so logs from routing daemons end up nowhere.

This PR fixes zlog_5424_apply_dst() and causes it to retry zlog_5424_open() after setting .prio_min unless the log handler is already active.

This PR also fixes a cosmetic issue in zlog_5424_cycle() which I noticed while debugging this. When called with fd=-1, it potentially passes a bogus pointer &zlt->zt to zlog_target_replace() because zlt is NULL in that case. In practice, this didn't cause problems because zt is the first member of zlt, so &zlt->zt is NULL as well when zlt is NULL and zlog_target_replace() properly handles NULL pointers. But it's probably not such a good idea to rely on the offset of zt. The commit tries to make it a bit more obvious/explicit what's happening here.

NOTE: After applying this PR, you'll probably notice that some routing daemons crash during startup (pathd, staticd). This is due to an assertion failure here:

frr/lib/privs.c

Line 214 in 8ca4c3d

assert(zprivs_state.syscaps_p && zprivs_state.caps);

zprivs_change_caps() gets called indirectly by frr_with_privs (...) {...} in zlog_5424_open() and this triggers the assertion failure for routing daemons without special capabilities like here:
zebra_capabilities_t _caps_p[] = {

How is this supposed to work? Are functions in lib allowed to use frr_with_privs without knowing what capabilities the routing daemon they're linked to has? Should all routing daemons have a minimum set of capabilities so that the assertion doesn't fail (and zlog_5424_open() can actually perform the desired operation)? Should the assertion failure be converted to a "do nothing return"?

@eqvinox Maybe you have an idea here - you seem to be the author of most of the 5424 code?

For testing purposes, I avoided the assertion failure by patching zcaps2sys() such that it never returns NULL, but that's probably not the proper fix.

When changing the log threshold (prio_min) using zlog_5424_apply_meta(), try
to open the log destination using zlog_5424_open() if we haven't done so
before. Without this, we might never open the destination at all, because
when zlog_5424_apply_dst() gets called (which so far was the only initial
caller of zlog_5424_open()), chances are that prio_min is still at its
initial value of ZLOG_DISABLED, causing it to skip the call to
zlog_5424_open().

Signed-off-by: Martin Buck <[email protected]>
In zlog_5424_cycle(), struct zlt_5424 *zlt only points to valid memory when
the passed file descriptor is >= 0 (and passing -1 seems to be a supported
use case). So we shouldn't try to compute the address of its zt member when
zlt is NULL.

Signed-off-by: Martin Buck <[email protected]>
@gromit1811 gromit1811 marked this pull request as draft January 9, 2025 08:32
@gromit1811
Copy link
Contributor Author

Set to draft. While this PR should be ready for merging, a fix for the privs assertion failure probably should go in first to avoid crashing staticd and mgmtd. I could try a PR for that as well, but I really don't know which of the following options is the one pick (see above):

How is this supposed to work? Are functions in lib allowed to use frr_with_privs without knowing what capabilities the routing daemon they're linked to has? Should all routing daemons have a minimum set of capabilities so that the assertion doesn't fail (and zlog_5424_open() can actually perform the desired operation)? Should the assertion failure be converted to a "do nothing return"?

And since privs are security-sensitive, I'd rather not start guessing here 😉

@eqvinox eqvinox self-requested a review January 9, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant