Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libflux: refactor reactor/watcher implementation #6494

Merged
merged 8 commits into from
Dec 16, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Dec 9, 2024

This was split off of an (aborted for now) effort to convert flux to libuv, mentioned in #6492.

This isolates the reactor and watcher implementations from each other somewhat, and also localizes libev calls to two source files.

Full disclosure: several watchers including the ones for zmq sockets, flux_t handles, and subprocess buffers are reimplemented in terms of the Flux API, which might introduce a few extra small mallocs per instance. This is because libev watchers are just structs that you initialize instead of opaque objects that you allocate, and each of these high level "composite watchers" contains three or four internal watchers that are now flux_watcher_t's. I can't imagine this is a big deal.

src/common/libflux/watcher.c Dismissed Show dismissed Hide dismissed
@garlick garlick force-pushed the reactor_cleanup branch 3 times, most recently from f7d3cec to 9ff2e53 Compare December 14, 2024 02:24
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and represents some nice cleanup. (Looks like a lot of work went on here)

I just had one comment about the new requirement to link watcher.lo whenever libsubprocess is used internally.

@@ -57,8 +57,10 @@ static struct flux_watcher_ops zmq_watcher = {
};

flux_watcher_t *zmqutil_watcher_create (flux_reactor_t *r,
void *zsock, int events,
flux_watcher_f cb, void *arg)
void *zsock,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: prefix commit message title with libzmqutil:

@@ -175,6 +175,7 @@ flux_start_LDADD = \

flux_job_LDADD = \
$(top_builddir)/src/common/libsubprocess/libsubprocess.la \
$(top_builddir)/src/common/libflux/watcher.lo \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring watcher.lo to be pulled in everywhere libsubprocess is linked seems a bit odd.

Does this indicate that fbuf.c and fbuf_watcher.c actually belong in libflux?

Alternately, does it work to link with libflux/libflux.la (always after libflux-core.la) to pull in the missing symbols from watcher.lo? That feels a bit less weird than linking with an individual .lo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a quick experiment I tried moving fbuf.[ch] and fbuf_watcher.[ch] to libflux, but this just moves the problem, making these symbols unresolved when linking with libsubprocess.la. I did verify that can be ameliorated by linking with libflux/libflux.la (and I think if this is placed after libflux-core.la then only missing symbols are linked instead of all flux_* symbols, though I didn't compare final executable sizes or anything.)

Not sure if you see that as an improvement or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was unsatisfying :-(

I think the fbuf watcher and sdbus watcher are appropriate to keep private to libsubprocess and sdbus as they are not really designed for wider usage. Not that moving them necessarily would mean that I guess.

I'll try your suggestion of linking with libflux.la. I see we were already doing that in the broker to access private message functions in libflux, FWIW. I did notice that jobtap plugins don't explicitly link with libflux-core.la so we will have to add that I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing too concerning IMHO on the executable size:

Before:

616K src/cmd/.libs/flux-exec
528K src/cmd/.libs/flux-start
1.7M src/modules/.libs/job-exec.so
1.3M src/modules/.libs/job-ingest.so
1.1M src/modules/.libs/sdbus.so

After:

616K src/cmd/.libs/flux-exec
528K src/cmd/.libs/flux-start
2.3M src/modules/.libs/job-exec.so
1.3M src/modules/.libs/job-ingest.so
1.8M src/modules/.libs/sdbus.so

I pushed this change as a separate commit - I'll squash it down it down if we agree this is OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly preferable, but don't feel strongly enough to argue against the previous version if you prefer that. 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🎉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I'm confused about something. Consider the following snippet from src/cmd/Makefile.am

fluxcmd_ldadd = \
        $(top_builddir)/src/common/libkvs/libkvs.la \
        $(top_builddir)/src/common/librlist/librlist.la \
        $(top_builddir)/src/common/libflux-internal.la \
        $(top_builddir)/src/common/libflux-core.la \
        $(top_builddir)/src/common/libflux-optparse.la \
        $(FLUX_SECURITY_LIBS) \
        $(LIBPTHREAD) \
        $(JANSSON_LIBS)

flux_exec_LDADD = \
        $(top_builddir)/src/common/libsubprocess/libsubprocess.la \
        $(fluxcmd_ldadd) \
        $(top_builddir)/src/common/libflux/libflux.la

flux-exec needs the fbuf watcher in libsubprocess.la, which in turn needs watcher.lo functions like watcher_create(). However, watcher_create() calls flux_reactor_incref().

How does flux_reactor_incref() get resolved? It could be resolved through reactor.lo in libflux.la or it could get resolved through the dynamic library libflux-core.la. Since libflux.la is at the end of the list, I assumed it could not be resolved by something earlier in the list. But somehow it works and the size of flux-exec is unchanged.

"I can't remember how the linker works, or if I ever knew how the linker works" seems to be a recurring theme for me 😢

Copy link
Contributor

@grondo grondo Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to refresh my memory on this as well. I think the rule is something like "the linker reads libraries from left to right and accumulates unresolved symbols as it goes". This is why I suggested libflux.la should come last, since resolving symbols statically that are already in libflux-core.so should be the last resort.

But I also don't understand how the linker knows it needs flux_reactor_incref() in the example you pose above. Perhaps something else in the chain pulls that symbol in, so it is already there by the time libflux.la is processed? (I hope?) You could check the final flux-exec binary with objdump perhaps to see if it got pulled in statically.

Edit: D'oh I realized reading back your comment that you probably already knew everything I said in the first paragraph, so I'm sorry for being redundant!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh duh, libflux-core.so is a dynamic library, so presumably all its symbols are available for anything before or after on the link line. Maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would imply that dynamic libs should always go first to maximize their use? Hmm. Thanks. Signing off for tonight.

Problem: some older code violates RFC 7 whitespace guidelines.

Break long parameter lists to one per line.
Problem: watcher implementations are derefencing w->fn() but
otherwise use accessors for struct watcher internals.

Add watcher_call() to replace direct access to structure member.
Problem: the zmq watcher is implemented in an ambigously named
source file.

Use zwatcher.[ch] instead of reactor.[ch]
Rename unit test too.
Update include directives.
Problem: the zmq watcher is implemeted directly on libev
which complicates changing the Flux reactor implementation
to something else.

Reimplement watcher using the Flux API.
Problem: the flux_t handle watcher is implemeted directly on libev
which complicates changing the Flux reactor implementation
to something else.

Reimplement watcher using the Flux API.
Move it to a new source file since it is now fairly standalone.
Problem: the subprocess internal "buffer watchers" are implemeted
directly on libev which complicates changing the Flux reactor
implementation to something else.

Reimplement watcher using the Flux API.
Problem: the internal sdbus watcher is_active() callback calls
ev_is_active() on a flux_watcher_t, which is accepted because
ev_is_active() is a macro that casts its pointer argument.

Although flux_watcher_is_active() is not currently used in the
sdbus module (the only user of the watcher), fix the code in
case it ever is.
Problem: the reactor and watcher implementations are unnecessarily
exposed and co-mingled.

Split up reactor.c, reactor.h (public), reactor_private.h into:

reactor.c, reactor_private.h
  Only the reactor struct and implementation.
  Add an accessor for the ev_loop, to be used only in watcher_wrap.c

watcher.c, watcher_private.h
  Only the watcher struct and generic "class" implementation.
  Add accessors to avoid exposing watcher struct details to watcher
  implementations.

watcher_wrap.c
  Wrapped libev watcher implementations.
  These now use watcher accessors rather than directly accessing
  the watcher struct.

A public watcher.h header is added to split out the public watcher
interfaces from reactor.h.  This was not strictly necessary, but
reactor.h was a little busy.  Users are still expected to just include
<flux/core.h>.

Update the zmq watcher, flux_t handle watcher, fbuf watchers, and sdbus
watcher to include watcher_private.h and use watcher accessors.

Replacing the inlines in reactor_private.h with actual functions
in watcher.c necessitated linking some executables with
src/common/libflux/libflux.la (after libflux-core.la).

libev use is now localized to reactor.c and watcher_wrap.c, which should
make transitioning to a new implementation somewhat easier than before.
The code should also be easier to read and navigate.
@mergify mergify bot merged commit 1088270 into flux-framework:master Dec 16, 2024
35 checks passed
Copy link

codecov bot commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 92.80959% with 54 lines in your changes missing coverage. Please review.

Project coverage is 83.60%. Comparing base (23e0f78) to head (971eef2).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
src/common/libzmqutil/zwatcher.c 82.00% 18 Missing ⚠️
src/common/libsubprocess/fbuf_watcher.c 93.06% 14 Missing ⚠️
src/common/libflux/hwatcher.c 82.85% 12 Missing ⚠️
src/common/libflux/watcher_wrap.c 97.21% 9 Missing ⚠️
src/modules/sdbus/watcher.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6494       +/-   ##
===========================================
+ Coverage   53.67%   83.60%   +29.93%     
===========================================
  Files         475      522       +47     
  Lines       79573    87680     +8107     
===========================================
+ Hits        42709    73303    +30594     
+ Misses      36864    14377    -22487     
Files with missing lines Coverage Δ
src/broker/overlay.c 83.17% <ø> (+24.32%) ⬆️
src/common/libflux/reactor.c 95.83% <100.00%> (+0.64%) ⬆️
src/common/libflux/watcher.c 100.00% <100.00%> (ø)
src/common/libzmqutil/monitor.c 83.16% <ø> (+12.16%) ⬆️
src/common/libzmqutil/zap.c 87.15% <ø> (-2.37%) ⬇️
src/modules/sdbus/watcher.c 86.95% <0.00%> (+1.44%) ⬆️
src/common/libflux/watcher_wrap.c 97.21% <97.21%> (ø)
src/common/libflux/hwatcher.c 82.85% <82.85%> (ø)
src/common/libsubprocess/fbuf_watcher.c 92.51% <93.06%> (+4.41%) ⬆️
src/common/libzmqutil/zwatcher.c 82.00% <82.00%> (ø)

... and 437 files with indirect coverage changes

@garlick garlick deleted the reactor_cleanup branch December 16, 2024 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants