-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move libzmq dependent functions out of libflux-core into libzmqutil #3797
Conversation
8503937
to
a25d905
Compare
Whew! Nice. Quick comment on function naming. What about:
just reducing the redundancy a bit. Same with the file names - could probably drop the |
re-pushed per comments. Comments suggested rename of functions were liked, so removed the |
I checked |
@garlick I'm not seeing it on a TOSS3 cluster. Are you ldd-ing Edit: oh , i just re-read your comment. It think you should be checking |
Ah! that was the problem. I'm not getting an
Yay! |
@garlick i think you need to run |
I think that's dependent on the libtool version? My system doesn't produce executables with the |
It is dependent on whether "fast-install" mode is enabled in libtool. I guess on some distros fast-install mode isn't needed, so libtool turns it off by default? I could have sworn it was enabled on my system (Ubuntu 18.04), but now I see it is set to
I did verify that on CentOS 7 (and thus presumably TOSS 3), fast-install is enabled by default:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally very pleased with how this turned out! Just the one question below.
src/common/libflux/message.c
Outdated
static int iovec_to_msg (flux_msg_t *msg, | ||
struct msg_iovec *iov, | ||
int iovcnt) | ||
int flux_iovec_to_msg (flux_msg_t *msg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do these need to have the flux_
prefix? This will make them available publicly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done b/c libflux-core.so
doesn't export any non-flux_
prefixed functions, so the iovec functions weren't available in some places. But now that I think about it, any code that needs the iovec functions could ldadd src/common/libflux/libflux.la
instead of src/common/libflx-core.la
. Let me see how that works out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this experiment didn't go as well as I thought it would. There's so many symbols to resolve, and a nice chunk of "what to ldadd first" to ensure dependencies get resolved accordingly. here's an example:
diff --git a/src/common/librouter/Makefile.am b/src/common/librouter/Makefile.am
index 756f5bb..4dcdd41 100644
--- a/src/common/librouter/Makefile.am
+++ b/src/common/librouter/Makefile.am
@@ -63,9 +63,15 @@ test_ldadd = \
$(top_builddir)/src/common/librouter/librouter.la \
$(top_builddir)/src/common/libtestutil/libtestutil.la \
$(top_builddir)/src/common/libflux-internal.la \
- $(top_builddir)/src/common/libflux-core.la \
- $(top_builddir)/src/common/libtap/libtap.la \
+ $(top_builddir)/src/common/libflux/libflux.la \
+ $(top_builddir)/src/common/liblsd/liblsd.la \
+ $(top_builddir)/src/common/libev/libev.la \
+ $(top_builddir)/src/common/libutil/libutil.la \
+ $(top_builddir)/src/common/libccan/libccan.la \
+ $(top_builddir)/src/common/libtomlc99/libtomlc99.la \
$(top_builddir)/src/common/libzmqutil/libzmqutil.la \
+ $(top_builddir)/src/common/libczmqcontainers/libczmqcontainers.la \
+ $(top_builddir)/src/common/libtap/libtap.la \
$(ZMQ_LIBS)
test_cppflags = \
worse yet I'm hitting segfaults in sharness tests. I haven't been able to figure it out yet, although the backtrace of the segfault points to an unknown function. A symbol collision wouldn't surprise me given what's going on like the above.
Hmmm. Let me mull over this tonight and maybe something obviously simpler can come to mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't think of an obviously clean solution here. We don't want to move iovec into yet another convenience lib. Would it be worthwhile to prefix the functions to something to indicate they are private and make sure they are accessible in the libflux-core.map
file? Like prefix them with fprivate_
or something instead?
Although unintended, we get away with this on the reactor side b/c we static inline those two simple functions. We'd have this problem there too if the functions were not inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just add libflux/libflux.la
right after libflux-core.la
in the LDADD rules? I didn't try to work through any tests, but the following patch resulted in a working broker and shmem connector for me:
diff --git a/src/broker/Makefile.am b/src/broker/Makefile.am
index f25cd2eaf..3c02fd731 100644
--- a/src/broker/Makefile.am
+++ b/src/broker/Makefile.am
@@ -67,6 +67,7 @@ flux_broker_LDADD = \
$(builddir)/libbroker.la \
$(top_builddir)/src/common/libflux-core.la \
$(top_builddir)/src/common/libzmqutil/libzmqutil.la \
+ $(top_builddir)/src/common/libflux/libflux.la \
$(top_builddir)/src/common/libpmi/libpmi_client.la \
$(top_builddir)/src/common/libflux-internal.la \
$(top_builddir)/src/common/libflux-optparse.la \
diff --git a/src/common/libflux/message.c b/src/common/libflux/message.c
index 51e402d2e..c6f0db869 100644
--- a/src/common/libflux/message.c
+++ b/src/common/libflux/message.c
@@ -384,7 +384,7 @@ static int msg_append_route (flux_msg_t *msg,
return 0;
}
-int flux_iovec_to_msg (flux_msg_t *msg,
+int iovec_to_msg (flux_msg_t *msg,
struct msg_iovec *iov,
int iovcnt)
{
@@ -507,7 +507,7 @@ flux_msg_t *flux_msg_decode (const void *buf, size_t size)
iovcnt++;
p += n;
}
- if (flux_iovec_to_msg (msg, iov, iovcnt) < 0)
+ if (iovec_to_msg (msg, iov, iovcnt) < 0)
goto error;
free (iov);
return msg;
@@ -1622,7 +1622,7 @@ void flux_msg_fprint (FILE *f, const flux_msg_t *msg)
flux_msg_fprint_ts (f, msg, -1);
}
-int flux_msg_to_iovec (const flux_msg_t *msg,
+int msg_to_iovec (const flux_msg_t *msg,
uint8_t *proto,
int proto_len,
struct msg_iovec **iovp,
diff --git a/src/common/libflux/message_private.h b/src/common/libflux/message_private.h
index 7923b8c10..8872d3082 100644
--- a/src/common/libflux/message_private.h
+++ b/src/common/libflux/message_private.h
@@ -61,11 +61,11 @@ struct msg_iovec {
void *transport_data;
};
-int flux_iovec_to_msg (flux_msg_t *msg,
+int iovec_to_msg (flux_msg_t *msg,
struct msg_iovec *iov,
int iovcnt);
-int flux_msg_to_iovec (const flux_msg_t *msg,
+int msg_to_iovec (const flux_msg_t *msg,
uint8_t *proto,
int proto_len,
struct msg_iovec **iovp,
diff --git a/src/common/libzmqutil/msg_zsock.c b/src/common/libzmqutil/msg_zsock.c
index 42e5b9c54..ad71f51fe 100644
--- a/src/common/libzmqutil/msg_zsock.c
+++ b/src/common/libzmqutil/msg_zsock.c
@@ -40,7 +40,7 @@ int zmqutil_msg_send_ex (void *sock, const flux_msg_t *msg, bool nonblock)
return -1;
}
- if (flux_msg_to_iovec (msg, proto, PROTO_SIZE, &iov, &iovcnt) < 0)
+ if (msg_to_iovec (msg, proto, PROTO_SIZE, &iov, &iovcnt) < 0)
goto error;
if (nonblock)
@@ -117,7 +117,7 @@ flux_msg_t *zmqutil_msg_recv (void *sock)
if (!(msg = flux_msg_create (FLUX_MSGTYPE_ANY)))
goto error;
- if (flux_iovec_to_msg (msg, iov, iovcnt) < 0)
+ if (iovec_to_msg (msg, iov, iovcnt) < 0)
goto error;
rv = msg;
error:
diff --git a/src/connectors/shmem/Makefile.am b/src/connectors/shmem/Makefile.am
index 82ade229b..1c64c6553 100644
--- a/src/connectors/shmem/Makefile.am
+++ b/src/connectors/shmem/Makefile.am
@@ -22,6 +22,7 @@ shmem_la_LDFLAGS = -module $(san_ld_zdef_flag) \
shmem_la_LIBADD = \
$(top_builddir)/src/common/libflux-internal.la \
$(top_builddir)/src/common/libflux-core.la \
+ $(top_builddir)/src/common/libflux/libflux.la \
$(top_builddir)/src/common/libzmqutil/libzmqutil.la \
$(ZMQ_LIBS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know internally how a binary "looks up" a symbol when it calls a function, but I'm wondering if the binary linked symbol and the symbol in the shared object are both looked up at times. Doing another debugging run, the function address of flux_msglist_pollevents()
also changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no further enlightenment here.
Is it possible to inline those iovec functions as a quick solution? <--EDIT: I see that's likely not the case.
Also, do the PROTO offsets and stuff need to be in that header? Seems like only PROTO_SIZE needs to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the proto export is just for the PROTO_SIZE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking to some DEG folks, there is some scanning stuff that is done internally to find symbols and that scanning can change over time, such as if libraries are used in a certain order / loaded in certain order. Setting LDDEBUG=all, I noticed this:
161371: binding file /g/g0/achu/chaos/git/flux-framework/flux-core/src/common/.libs/libflux-core.so.2 [0] to /g/g0/achu/chaos/git/flux-framework/flux-core/src/common/.libs/libflux-core.so.2 [0]: normal symbol `flux_msglist_create'
161452: binding file /g/g0/achu/chaos/git/flux-framework/flux-core/src/common/.libs/libflux-core.so.2 [0] to /g/g0/achu/chaos/git/flux-framework/flux-core/src/common/.libs/libflux-core.so.2 [0]: normal symbol `flux_msglist_create'
161452: binding file /g/g0/achu/chaos/git/flux-framework/flux-core/src/broker/.libs/lt-flux-broker [0] to /g/g0/achu/chaos/git/flux-framework/flux-core/src/common/.libs/libflux-core.so.2 [0]: normal symbol `flux_msglist_create'
I don't know exactly how to read this output, but it does look like flux_msglist_create()
was found in the .so file and the lt-flux-broker
at certain points in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm trying the libmessageprivate.la
approach now. Unfortunately we have to put more stuff into message_private.h
and message_private.c
for things to work, but it seems the more correct solution.
struct flux_reactor { | ||
struct ev_loop *loop; | ||
int usecount; | ||
unsigned int errflag:1; | ||
}; | ||
|
||
struct flux_watcher { | ||
flux_reactor_t *r; | ||
flux_watcher_f fn; | ||
void *arg; | ||
struct flux_watcher_ops *ops; | ||
void *data; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate that we need to expose these things outside of reactor.c, given that we theoretically are providing an API to implement custom watchers. I guess it's because ev_zmq was implemented as a pure libev watcher, and libev isn't exposed directly. We might be able to reimplement this in terms of the exported functions, but I'm thinking that could be hairy and time consuming, and this works.
I guess it's for the greater good, and it's still private to flux-core :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't think about the idea of implementing via exposed functions. But yeah, the fact libev
wasn't public did create problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not worth the effort at this time IMHO.
@@ -23,6 +23,7 @@ | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message for c9096f7
This might be the commit that people will find in the history when they ask "why isn't libzmqutil part of libflux"? So it might be good to mention the FIPS startup penalty and its adverse effect on job throughput in the commit message problem statement, even though the issue reference does provide that background. Something like:
Problem: on systems configured for FIPS compliance, the cyrpto libs used by libzmq perform a self-test in the dso init function, which slows down library loading. When linking with libzmq is widespread, this significantly impairs job throughput.
Isolate 0MQ functions that depend on libzmq in a non-exported convenience library, to make it possible to minimize the number of components linked with libzmq.
Sorry, I'm not up to date on this PR. However, I was going to ask if it would be useful to add some kind of test that ensures the build (maybe just Just thought I throw that idea out there. |
Good idea. I just added a simple:
regression test, until I remembered our conversation above regarding the default of |
Would |
re-pushed, fixing up the commit message per comments above and adding the extra test. This re-push is mostly to see if the new test works correctly under the docker images. None of the non-prefixing of the iovec functions is in this re-push. |
re-pushed. To resolve not exporting iovec functions, I created a note I added these functions to message private.
basically I had to make a call about making |
OK, finally getting close to done I think. The |
Well FWIW I moved some things around on my The changes are just on top of this branch and do not have proper commit messages since they probably would just be squashed if you decide to include them. The changes are
Let me know what you think @chu11. I'm open to whatever - we need to get this done and move on. |
@garlick Ahhh, I didn't think about putting it into libflux-internal. Makes sense. I think it works. Let me yoink your commits and squash em where appropriate and make sure all commits rebuild/pass tests accordingly. |
Problem: The message API currently has an internal function called flux_msg_create_common(). This function would make it difficult to migrate some functions out of libflux that depend on it. Solution: Allow flux_msg_create() to take the type FLUX_MSGTYPE_ANY, indicating that the type for this message is not yet known. As a result, flux_msg_create_common() can be removed. Update all prior callers of flux_msg_create_common(). Add checks to ensure a message type has been set before the message is encoded/sent. Add unit tests.
In preparation for future refactoring, place route creation or destruction code inside wrapper functions. Have flux_msg_route_clear(), flux_msg_route_push(), and flux_msg_route_delete_last() call these new helper functions.
Problem: msg_append_route() is not named consistently to other functions. Solution: Rename to msg_route_append().
re-pushed, squashing and adjusting commits in the process, including the commit messages. |
hmmm, one builder failed with.
i'm perplexed, as nothing in the unit tests or Edit: it failed again on centos8 - py3.7 ... super confused right now. Edit2: Well I guess the absorption of |
Problem: We wish to move some messaging functions out of libflux, but there are some shared structs / functions that will be needed if we do so. Solution: Create a message convenience library within src/common/libflux and add it to libflux-internal so it can be used internally by other flux-core code. The following structs/functions were migrated out of message.c into their own individual files. struct flux_msg -> message_private.h struct msg_iovec, msg_to_iovec(), iovec_to_msg() -> message_iovec.[ch] PROTO macros, msg_proto_setup(), proto_get_u32() -> message_proto.[ch] struct route_id, msg_route_push/append/clear/delete_last() -> message_route.[ch] Update several linker dependencies to ensure all symbols are available for compilation.
Problem: We would like to remove the libzmq dependency on libflux-core. Doing so requires us to put libzmq dependent functions somewhere else. Solution: Create a new internal library libzmqutil and put flux_msg_sendzsock() and flux_msg_recvzsock() in there. Rename functions to zmqutil_msg_send() and zmqutil_msg_recv() respectively. Move related unit tests as well. Update all callers, include new headers, and link to new internal library.
Problem: In an effort to consolidate zmq dependent code into one library, ev_zmq in libutil is out of place. Solution: Move ev_zmq from libutil to libzmqutil. As a result, libutil is no longer dependent on libzmq, and thus libflux-internal is no longer dependent on libzmq. Remove libzmq dependency in libflux-internal. Any binaries/libraries that need libzmq and previously acquired it via libflux-internal, add libzmq dependency directly.
Problem: We wish to move some reactor functions out of libflux, but there are some shared structs / functions that will be needed if we do so. Solution: Create a reactor_private.h header file that can be used internally by flux-core utility libraries. Move struct flux_reactor, struct_watcher, events_to_libev, and libev_to_events into the header.
Problem: On systems configured for FIPS compliance, the cyrpto libs used by libzmq perform a self-test in the dso init function, which slows down library loading. When linking with libzmq is widespread, this significantly impairs job throughput. Solution: Isolate functions that depend on libzmq in the non-exported convenience library libzmqutil. This will make it possible to minimize the number of components linked to libzmq. The remaining functions that must be isolated are flux_zmq_watcher_create() and flux_zmq_watcher_get_zsock(). Move them into libzmqutil and rename them to zmqutil_wacher_create() and zmqutil_watcher_get_zsock(). Update callers accordingly. As a result of this change, libflux is no longer dependent on libzmq and the library dependency can be removed. Add dependency to any libraries/binaries previously dependent on libflux's dependency on libzmq. Fixes flux-framework#3617
Codecov Report
@@ Coverage Diff @@
## master #3797 +/- ##
==========================================
- Coverage 83.34% 83.33% -0.01%
==========================================
Files 342 348 +6
Lines 50997 51014 +17
==========================================
+ Hits 42502 42515 +13
- Misses 8495 8499 +4
|
re-pushed with a fix for the builder failure described above, just had to add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're good here. Thanks for the long slog! Set MWP when you're ready.
It's been a massive journey to try and de-link
libzmq
fromlibflux-core
(#3620 #3622, #3623, #3624, #3697, #3701, #3742, #3746, #3773, #3776), but hopefully this is the end :-)This PR will move the few
libzmq
dependent functions out oflibflux-core
and into a new convenience librarylibzmqutil
. The functions are:Only those few dependent modules/binaries (
broker
,shmem
,libtestutil
) that need them will uselibzmqutil
and link tolibzmq
. We have to create a few private helper headers along the way (message_private.h
,reactor_private.h
) that exports a few macros, helper functions, structs, etc.You'll notice two
fixup
commits in this PR. They both rename functions.renaming the functions is of course completely optional. I went back and forth on it. So it's just fixup patches for now, we can squash or remove based on discussion. Or perhaps folks don't like how I renamed them, I struggled with that as well. I didn't want to prefix anything
zmq_
to avoid potential confusion with it being alibzmq
function. Butzmqutil_
is not the prettiest prefix either.Performance tests on
fluke
(multiple runs showed consistent results similar to these)current master
this PR branch
generally speaking, around a 10% improvement across many runs