-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libflux: refactor reactor/watcher implementation #6494
Conversation
f7d3cec
to
9ff2e53
Compare
9ff2e53
to
eb36970
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM and represents some nice cleanup. (Looks like a lot of work went on here)
I just had one comment about the new requirement to link watcher.lo
whenever libsubprocess is used internally.
src/common/libzmqutil/reactor.c
Outdated
@@ -57,8 +57,10 @@ static struct flux_watcher_ops zmq_watcher = { | |||
}; | |||
|
|||
flux_watcher_t *zmqutil_watcher_create (flux_reactor_t *r, | |||
void *zsock, int events, | |||
flux_watcher_f cb, void *arg) | |||
void *zsock, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: prefix commit message title with libzmqutil:
src/cmd/Makefile.am
Outdated
@@ -175,6 +175,7 @@ flux_start_LDADD = \ | |||
|
|||
flux_job_LDADD = \ | |||
$(top_builddir)/src/common/libsubprocess/libsubprocess.la \ | |||
$(top_builddir)/src/common/libflux/watcher.lo \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring watcher.lo
to be pulled in everywhere libsubprocess is linked seems a bit odd.
Does this indicate that fbuf.c
and fbuf_watcher.c
actually belong in libflux
?
Alternately, does it work to link with libflux/libflux.la
(always after libflux-core.la
) to pull in the missing symbols from watcher.lo
? That feels a bit less weird than linking with an individual .lo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a quick experiment I tried moving fbuf.[ch]
and fbuf_watcher.[ch]
to libflux, but this just moves the problem, making these symbols unresolved when linking with libsubprocess.la
. I did verify that can be ameliorated by linking with libflux/libflux.la
(and I think if this is placed after libflux-core.la
then only missing symbols are linked instead of all flux_*
symbols, though I didn't compare final executable sizes or anything.)
Not sure if you see that as an improvement or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was unsatisfying :-(
I think the fbuf watcher and sdbus watcher are appropriate to keep private to libsubprocess and sdbus as they are not really designed for wider usage. Not that moving them necessarily would mean that I guess.
I'll try your suggestion of linking with libflux.la
. I see we were already doing that in the broker to access private message functions in libflux, FWIW. I did notice that jobtap plugins don't explicitly link with libflux-core.la
so we will have to add that I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing too concerning IMHO on the executable size:
Before:
616K src/cmd/.libs/flux-exec
528K src/cmd/.libs/flux-start
1.7M src/modules/.libs/job-exec.so
1.3M src/modules/.libs/job-ingest.so
1.1M src/modules/.libs/sdbus.so
After:
616K src/cmd/.libs/flux-exec
528K src/cmd/.libs/flux-start
2.3M src/modules/.libs/job-exec.so
1.3M src/modules/.libs/job-ingest.so
1.8M src/modules/.libs/sdbus.so
I pushed this change as a separate commit - I'll squash it down it down if we agree this is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it slightly preferable, but don't feel strongly enough to argue against the previous version if you prefer that. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I'm confused about something. Consider the following snippet from src/cmd/Makefile.am
fluxcmd_ldadd = \
$(top_builddir)/src/common/libkvs/libkvs.la \
$(top_builddir)/src/common/librlist/librlist.la \
$(top_builddir)/src/common/libflux-internal.la \
$(top_builddir)/src/common/libflux-core.la \
$(top_builddir)/src/common/libflux-optparse.la \
$(FLUX_SECURITY_LIBS) \
$(LIBPTHREAD) \
$(JANSSON_LIBS)
flux_exec_LDADD = \
$(top_builddir)/src/common/libsubprocess/libsubprocess.la \
$(fluxcmd_ldadd) \
$(top_builddir)/src/common/libflux/libflux.la
flux-exec
needs the fbuf watcher in libsubprocess.la
, which in turn needs watcher.lo
functions like watcher_create()
. However, watcher_create()
calls flux_reactor_incref()
.
How does flux_reactor_incref()
get resolved? It could be resolved through reactor.lo
in libflux.la
or it could get resolved through the dynamic library libflux-core.la
. Since libflux.la
is at the end of the list, I assumed it could not be resolved by something earlier in the list. But somehow it works and the size of flux-exec
is unchanged.
"I can't remember how the linker works, or if I ever knew how the linker works" seems to be a recurring theme for me 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have to refresh my memory on this as well. I think the rule is something like "the linker reads libraries from left to right and accumulates unresolved symbols as it goes". This is why I suggested libflux.la
should come last, since resolving symbols statically that are already in libflux-core.so
should be the last resort.
But I also don't understand how the linker knows it needs flux_reactor_incref()
in the example you pose above. Perhaps something else in the chain pulls that symbol in, so it is already there by the time libflux.la
is processed? (I hope?) You could check the final flux-exec
binary with objdump
perhaps to see if it got pulled in statically.
Edit: D'oh I realized reading back your comment that you probably already knew everything I said in the first paragraph, so I'm sorry for being redundant!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh duh, libflux-core.so
is a dynamic library, so presumably all its symbols are available for anything before or after on the link line. Maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would imply that dynamic libs should always go first to maximize their use? Hmm. Thanks. Signing off for tonight.
Problem: some older code violates RFC 7 whitespace guidelines. Break long parameter lists to one per line.
Problem: watcher implementations are derefencing w->fn() but otherwise use accessors for struct watcher internals. Add watcher_call() to replace direct access to structure member.
Problem: the zmq watcher is implemented in an ambigously named source file. Use zwatcher.[ch] instead of reactor.[ch] Rename unit test too. Update include directives.
Problem: the zmq watcher is implemeted directly on libev which complicates changing the Flux reactor implementation to something else. Reimplement watcher using the Flux API.
Problem: the flux_t handle watcher is implemeted directly on libev which complicates changing the Flux reactor implementation to something else. Reimplement watcher using the Flux API. Move it to a new source file since it is now fairly standalone.
Problem: the subprocess internal "buffer watchers" are implemeted directly on libev which complicates changing the Flux reactor implementation to something else. Reimplement watcher using the Flux API.
Problem: the internal sdbus watcher is_active() callback calls ev_is_active() on a flux_watcher_t, which is accepted because ev_is_active() is a macro that casts its pointer argument. Although flux_watcher_is_active() is not currently used in the sdbus module (the only user of the watcher), fix the code in case it ever is.
Problem: the reactor and watcher implementations are unnecessarily exposed and co-mingled. Split up reactor.c, reactor.h (public), reactor_private.h into: reactor.c, reactor_private.h Only the reactor struct and implementation. Add an accessor for the ev_loop, to be used only in watcher_wrap.c watcher.c, watcher_private.h Only the watcher struct and generic "class" implementation. Add accessors to avoid exposing watcher struct details to watcher implementations. watcher_wrap.c Wrapped libev watcher implementations. These now use watcher accessors rather than directly accessing the watcher struct. A public watcher.h header is added to split out the public watcher interfaces from reactor.h. This was not strictly necessary, but reactor.h was a little busy. Users are still expected to just include <flux/core.h>. Update the zmq watcher, flux_t handle watcher, fbuf watchers, and sdbus watcher to include watcher_private.h and use watcher accessors. Replacing the inlines in reactor_private.h with actual functions in watcher.c necessitated linking some executables with src/common/libflux/libflux.la (after libflux-core.la). libev use is now localized to reactor.c and watcher_wrap.c, which should make transitioning to a new implementation somewhat easier than before. The code should also be easier to read and navigate.
e5f3da4
to
971eef2
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6494 +/- ##
===========================================
+ Coverage 53.67% 83.60% +29.93%
===========================================
Files 475 522 +47
Lines 79573 87680 +8107
===========================================
+ Hits 42709 73303 +30594
+ Misses 36864 14377 -22487
|
This was split off of an (aborted for now) effort to convert flux to libuv, mentioned in #6492.
This isolates the reactor and watcher implementations from each other somewhat, and also localizes libev calls to two source files.
Full disclosure: several watchers including the ones for zmq sockets,
flux_t
handles, and subprocess buffers are reimplemented in terms of the Flux API, which might introduce a few extra small mallocs per instance. This is because libev watchers are just structs that you initialize instead of opaque objects that you allocate, and each of these high level "composite watchers" contains three or four internal watchers that are nowflux_watcher_t
's. I can't imagine this is a big deal.