Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround issues with ld from binutils 2.29 #266

Merged
merged 7 commits into from
Dec 14, 2017

Conversation

jnpkrn
Copy link
Contributor

@jnpkrn jnpkrn commented Aug 10, 2017

See #265.

WIP, do not merge yet, though feel free to comment.
EDIT: should no longer be considered WIP, now subject to review.

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch 7 times, most recently from 01c4700 to c299e88 Compare August 10, 2017 23:35
lib/Makefile.am Outdated
{ echo "only applicable to SO symlink scheme"; exit 1; }; \
echo "INPUT($${t1_bn})" > "$(DESTDIR)$(libdir)/libqb.so-t"; \
cat $< >> "$(DESTDIR)$(libdir)/libqb.so-t"; \
mv -f "$(DESTDIR)$(libdir)/libqb.so-t" "$(DESTDIR)$(libdir)/libqb.so"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks horrible. Can it be tidied up or put in a script somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps (look at libtool itself to see some really hard to read shell code).

But what's worse, 12 lines of well-wrapped (and overall commented code),
or having to distribute yet another single-use-only file? I am biased towards
the latter :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it gets a lot tidier when you observe make install screen :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually able to separate last 2 lines, leaving the "blob" down to 10 :)

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch 2 times, most recently from ecde2ab to 7a953ef Compare August 30, 2017 21:11
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Aug 30, 2017

Still rather WIP, but the most difficult part (making libtool actually serve
our purpose as a linker script dependency injector) is tackled now.
Now it almost passes "make distcheck", it fails upon artificial relink when
installing, will see what can be done about that, no actual relink is needed,
in fact.

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch from 7a953ef to 7bf1627 Compare August 31, 2017 07:10
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Aug 31, 2017

Ok, it now passes the CI run. Another yet unexplored field is how this all combine
with static linking, should anybody want to use that instead.

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch from 7bf1627 to eb3bde0 Compare September 1, 2017 18:44
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Sep 1, 2017

WIP 3 had an issue of some unintended RPATH occurrences in
qb-blackbox binary or possibly libqb.so.*.

That's definitely not desired:
https://fedoraproject.org/wiki/Packaging:Guidelines#Beware_of_Rpath

Beside fixing that, WIP 4 is also better commented (tries so).

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 6, 2017

Hopefully not many more iterations of this patchset is needed :)

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 6, 2017

/usr/bin/ld: ../log_client.o: undefined reference to symbol 'dlclose@@GLIBC_2.2.5'

Strange, looks like -ldl disappeared from the build equations, will look into that.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 6, 2017

OK, the issue is mainly that older libtool and gcc are not so smart
as the recent versions :-/
(libtool in not evaluating transitive accumulative closures, gcc for
being more picky about the order of arguments).
Plus some other negligence on my side.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 9, 2017

Gotcha :-/
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=702737
(but encouraging commit:
corosync/corosync@0f1dc5c)

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch 4 times, most recently from 6e7ed7b to e683356 Compare October 9, 2017 20:55
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 10, 2017

Testing on some other platforms, for FreeBSD, I need to at least respell the
sed command lib/Makefile.am. Will see what's more to be observed outside
the Linux realms.

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch from e683356 to e393994 Compare October 10, 2017 22:36
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 10, 2017

Fixed the sed command for portability and also bumped the -version-info
as necessitated by adding new API items -- I firmly think we should start
doing these bumps anytime the particular commit justifies/requires it.
Just consider the inter-release snapshots, they should all declare the
true status and not wait until some pre-release clarification, as otherwise
these snapshots could be flawed in terms of ABI compatibility, IMHO.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Oct 10, 2017

IOW, the reviewer should be obliged to spot "public ABI" changes and
ensure they are captured in -version-info properly right away.
In case of major breakage, either new release should be cut or
the whole change deferred till the latest possible point prior to doing so.

@jnpkrn jnpkrn force-pushed the workaround-ld-2.29 branch from d64b314 to c011b12 Compare December 12, 2017 22:24
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 12, 2017

Chrissie, you can spot the gist of wording changes at
https://src.fedoraproject.org/rpms/libqb/c/93fffaf298477d0d4c4a0a6898a501d6f068a43a

I also demoted one of the new non-fatal warnings in lib/log.c to mere notice.

@chrissie-c chrissie-c merged commit c011b12 into ClusterLabs:master Dec 14, 2017
@chrissie-c
Copy link
Contributor

Now merged. Thanks again for all your work on this issue

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Dec 26, 2017

Some after the fact notes:

@gao-yan
Copy link
Member

gao-yan commented Jan 11, 2018

And then for 2.29.1 release of binutils once again, as someone actually
noticed something went overboard with the 2.29 changes:

http://lists.gnu.org/archive/html/bug-binutils/2017-08/msg00195.html
...
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=487b6440dad57440939fab7afdd84a218b612796

At least that change doesn't invalidate all the effort being put into
the original version of the changeset, only the configure script check
had to be refined so as not to miss the "orphan section magic not
working properly out of the box, without band aid" observation
(see the inline comment) -- the workaround arrangement needs
to be applied in that case as well.

Does the fix in binutils-2.29.1 provide any practical help for this issue? It sounds like basically libqb has to do the exactly same for binutils-2.29.1 as for binutils-2.29?

@gao-yan
Copy link
Member

gao-yan commented Jan 11, 2018

And there seems to be some issue with this combination:

  1. libqb-1.0.3+20171226.6d62b64 built with binutils-2.26.1:
checking whether GCC supports __attribute__((section()) + ld supports orphan sections... yes
checking whether linker emits global boundary symbols for orphan sections... yes
  1. And pacemaker-1.1.18+20180104.7ba28d854 fails to build with this libqb:
PATH=/home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools:$PATH /home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools/iso8601 --help
...
iso8601: utils.c:66: common: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.
Makefile:1469: recipe for target 'iso8601.8' failed
gmake[1]: *** [iso8601.8] Aborted (core dumped)

-- This is on openSUSE 42.3.

With the same binutils and other package versions on SLE12 SP3/SP2, pacemaker encounters the same assertion with "crm_shadow --help" on build.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jan 11, 2018 via email

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jan 11, 2018 via email

@jnpkrn jnpkrn mentioned this pull request Jan 11, 2018
@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jan 11, 2018 via email

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jan 11, 2018 via email

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jan 11, 2018 via email

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Jan 12, 2018 via email

jnpkrn added a commit to jnpkrn/libqb that referenced this pull request Jan 18, 2018
It turned out that while log_test_mock.sh already provided ways for
extra variations in the composite body of the program to inspect for
a working logging, "omit logging at the target client side completely,
leave it along with the respective self-check just at the intermediate
library this program uses" one hadn't apparently been exercised with
binutils < 2.29.1 (which is the only one OK from the get-go since it
arranges boundary denoting symbols for orphan sections as protected).

This made the pacemaker building fail half the way with libqb 1.0.3
and binutils < 2.29.1 because the built executables are instantly run
as to extract their help screen[1], and they represent the said pain
pattern:

- program not using libqb log subsystem directly, but
- linked to dynamic library that itself does + it also utilizes
  QB_LOG_INIT_DATA macro, which
- upon loading the executable and the the linked dependencies
  prior to proper run invokes the checks, where
- one in particular tries to assess whether the direct-access
  boundary symbols are not aliasing (equal), but
- because with standard visibility, symbols from the program
  take a precedence, actually its symbols are prioritized (instead
  of the ones pertaining to the library layer as expected), and
- since previous fix to accommodate binutils 2.29+ fancy/new handling
  of the boundary symbols they are not subject to any sort of
  garbage collection (no symbols were emitted for a section that was
  not known because there were no instruction it should contain
  anything, until we stuck those symbols in place by force by the
  means of linker script to that effect), these boundary symbols
  indeed occur in this end program having no callsite data (see first
  point), meaning that start and stop address indications for
  the section equals, hence
- the said assertion triggers (despite it should not = false positive)

This clearly means that "QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP"
cannot be used unconditionally.  However, it would left the software
using logging along with QB_LOG_INIT_DATA macro (which was already
highlighted as its vital part in order to avoid silent logging
malfunction) that doesn't utilize _GNU_SOURCE macro (unlocking some
GNU extensions on top of basic POSIX interface) short of the main
checks (because they are conditionalized for requiring said extensions),
and libqb never required the client side to define that macro nor it
dared to define it behind user's back as it can have unexpected
consequences.

Luckily, the first paragraph carries the key: protected
symbols behave exactly as required here[2]:

> Protected visibility is like default visibility except that it
> indicates that references within the defining module bind to the
> definition in that module. That is, the declared entity cannot be
> overridden by another module.

So we just move said comparison to "!defined(_GNU_SOURCE)" conditional
branch within qblog.h, and to cater binutils prior to 2.29.1 (which has
it like that by default as mentioned), we also mark the declarations
of the boundary symbols, likewise conditionally, with protected
visilibity (there appears to be no way for that in the linker script).
But that would be too easy as once again, 2.29 linker begs to differ and
these two "!defined(_GNU_SOURCE)" measures will actually do more harm
than good when in the mix.  Hence, new QB_LD_2_29 macro is proclaimed
the kill-switch for that case, and the user becomes responsible to
either define it when building with this 2.29 troublemaker (as a recap,
2.29.1 is fine from this perspective), or to start using _GNU_SOURCE.

One more question mark remains, though: are not the QB_LOG_INIT_DATA's
checks in _GNU_SOURCE case weaker now than used to be?  The anwer is:
no, as long as looking at the symbols through dlopen'd particular shared
object will contain no overrides from either already loaded dependencies
(if caching is so aggressive to detect it's already in loaded in
caller's context) or from recursive load.  Currently, it doesn't seem
to be the case, but it may depend on the implementation of dynamic
linking library or the toolchain.  If this observation is proved
to be wrong, the solution may be as simple as dropping !defined(_GNU_C)
condition from the guard of making the boundary symbols with protected
visibility -- it is not done now also with respect to non-gcc compilers
which may not recognize that.

Last but not least, dl* calls in the QB_LOG_INIT_DATA's checks are
themselves subject to scrutiny now: should they fail unexpectedly,
the run is terminated just as well.  This led to discovery of the
issue masked so far, boiling down to unability do dlopen executable
(as opposed to shared object)[3].  It did not matter before, because
of rather best-effort, optimistic approach (perform the final check
only if precondition steps succeeded), but now, we have to add an
extra stipulation that this case won't lead to premature termination
-- it just happens to sometimes be the case, and there's not much
we can do to detect run down on the level of the executable proactively,
at least not based on the brief experiments.

[1] ClusterLabs#266 (comment)
[2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-visibility-function-attribute
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=11754

Signed-off-by: Jan Pokorný <[email protected]>
jnpkrn added a commit to jnpkrn/libqb that referenced this pull request Jan 22, 2018
It turned out that while log_test_mock.sh already provided ways for
extra variations in the composite body of the program to inspect for
a working logging, "omit logging at the target client side completely,
leave it along with the respective self-check just at the intermediate
library this program uses" one hadn't apparently been exercised with
binutils < 2.29.1 (which is the only one OK from the get-go since it
arranges boundary denoting symbols for orphan sections as protected).

This made the pacemaker building fail half the way with libqb 1.0.3
and binutils < 2.29.1 because the built executables are instantly run
as to extract their help screen[1], and they represent the said pain
pattern:

- program not using libqb log subsystem directly, but
- linked to dynamic library that itself does + it also utilizes
  QB_LOG_INIT_DATA macro, which
- upon loading the executable and the the linked dependencies
  prior to proper run invokes the checks, where
- one in particular tries to assess whether the direct-access
  boundary symbols are not aliasing (equal), but
- because with standard visibility, symbols from the program
  take a precedence, actually its symbols are prioritized (instead
  of the ones pertaining to the library layer as expected, remember
  that that check was intended as self-contained, targeting just
  it's own participating part in the link scheme), and
- since previous fix to accommodate binutils 2.29+ fancy/new handling
  of the boundary symbols they are not subject to any sort of
  garbage collection (no symbols were emitted for a section that was
  not known because there were no instruction it should contain
  anything, until we stuck those symbols in place by force by the
  means of linker script to that effect), these boundary symbols
  indeed occur in this end program having no callsite data (see first
  point), meaning that start and stop address indications for
  the section equals, hence
- the said assertion triggers (despite it should not = false positive),
  implying breach of self-containment of the check (which naturally
  cannot be responsible for any other linked part)

This clearly means that "QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP"
cannot be used unconditionally.  However, it would left the software
using logging along with QB_LOG_INIT_DATA macro (which was already
highlighted as its vital part in order to avoid silent logging
malfunction) that doesn't utilize _GNU_SOURCE macro (unlocking some
GNU extensions on top of basic POSIX interface) short of the main
checks (because they are conditionalized for requiring said extensions),
and libqb never required the client side to define that macro nor it
dared to define it behind user's back as it can have unexpected
consequences.

Luckily, the first paragraph carries the key: protected
symbols behave exactly as required here[2]:

> Protected visibility is like default visibility except that it
> indicates that references within the defining module bind to the
> definition in that module. That is, the declared entity cannot be
> overridden by another module.

So we just move said comparison to "!defined(_GNU_SOURCE)" conditional
branch within qblog.h, and to cater binutils prior to 2.29.1 (which has
it like that by default as mentioned), we also mark the declarations
of the boundary symbols, likewise conditionally, with protected
visilibity (there appears to be no way for that in the linker script).
But that would be too easy as once again, 2.29 linker begs to differ and
these two "!defined(_GNU_SOURCE)" measures will actually do more harm
than good when in the mix.  Hence, new QB_LD_2_29 macro is proclaimed
the kill-switch for that case, and the user becomes responsible to
either define it when building with this 2.29 troublemaker (as a recap,
2.29.1 is fine from this perspective), or to start using _GNU_SOURCE.

One more question mark remains, though: are not the QB_LOG_INIT_DATA's
checks in _GNU_SOURCE case weaker now than used to be?  The anwer is:
no, as long as looking at the symbols through dlopen'd particular shared
object will contain no overrides from either already loaded dependencies
(if caching is so aggressive to detect it's already in loaded in
caller's context) or from recursive load.  Currently, it doesn't seem
to be the case, but it may depend on the implementation of dynamic
linking library or the toolchain.  If this observation is proved
to be wrong, the solution may be as simple as dropping !defined(_GNU_C)
condition from the guard of making the boundary symbols with protected
visibility -- it is not done now also with respect to non-gcc compilers
which may not recognize that.

Last but not least, dl* calls in the QB_LOG_INIT_DATA's checks are
themselves subject to scrutiny now: should they fail unexpectedly,
the run is terminated just as well.  This led to discovery of the
issue masked so far, boiling down to unability to dlopen executable
(as opposed to shared object)[3].  It did not matter before, because
of rather best-effort, optimistic approach (perform the final check
only if precondition steps succeeded), but now, we have to add an
extra stipulation that this case won't lead to premature termination
-- it just happens to sometimes be the case, and there's not much
we can do to detect run down on the level of the executable proactively,
at least not based on the brief experiments.

[1] ClusterLabs#266 (comment)
[2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-visibility-function-attribute
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=11754

Signed-off-by: Jan Pokorný <[email protected]>
@gao-yan
Copy link
Member

gao-yan commented Feb 6, 2018

FYI, there's a thread about this topic ongoing from binutils upstream:
https://sourceware.org/ml/binutils/2018-01/msg00265.html

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Feb 6, 2018

Thanks for the heads up.
I am afraid I'm not qualified to contribute to the discussion about these
internal details, I am barely knowledgable about the field from the user's
perspective. One of the biggest barriers to entry is the non-existing
documentation - it's just the code that deliberately decides what's the
right way to go.

It looks to me that current approach, together with some rough edges
removal (#288), works well for us. Yes, there's a crux of superfluous
warning being emitted by the linker when workaround linker script
kicks in, but that was voiced just once and the explanation is handy
at places one might consequently look at.

If there's any intention to handle the mechanism at hand better on linker
side, I think it's natural to strictly require that our workaround approach
keeps working regardless. AFAIK, this already applies to all reasonably
recent binutils versions, pre-2.29 - 2.30, and would be very pitiful if the
"tweak for good" actually broke this precious continuity...

@gao-yan
Copy link
Member

gao-yan commented Feb 7, 2018

I can tell the difficulty of it... You did nice work fixing this though.

Apparently binutils upstream realized the situation indeed needs to be improved and they are achieving a better solution now:

https://sourceware.org/ml/binutils/2018-02/msg00056.html

Actually more users are aware of that:
https://sourceware.org/ml/binutils/2018-02/msg00038.html

I also hope the resulting solution won't break the current approach of libqb.

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Feb 7, 2018

Looking at

https://sourceware.org/ml/binutils/2018-01/msg00347.html

and figuring out that there have been more changes on the topic recently
in master:

$ git log --oneline --since=2017-11-01 | grep -Ee __start -e __stop
823143c6ca Check if __start/__stop symbols are referenced by shared objects
040b4a9eb8 Add --gc-sections test checking removal of __start/__stop symbols.
36b8fda5d6 Make __start/__stop symbols dynamic and add testcase
32253bb796 Define __start/__stop symbols when there is only a dynamic def

I got slightly worried.

Good news is that libqb's internal sanity check:

./autogen.sh && ./configure && make clean && make -C lib \
  && make -C tests/functional/log_internal/ check

keeps working also with this built-from-checkout binutils version,
407aa07cee - 2.30.0.20180207
(binutils compilation was choking on some incompatible function
signatures with GCC 8 so had to resort to downgrade, sigh).

@wferi
Copy link
Contributor

wferi commented Apr 20, 2018

@jnpkrn, while uploading Corosync 2.4.4 (built with libqb 1.0.3) to Debian unstable, I noticed that the size of the corosync binary grew from 314632 bytes (in 2.4.2, built with libqb 1.0.1) to 1048376 bytes. According to readelf -l the sizes of the ELF segments didn't change much, but the __verbose section moved into a new loadable segment with file offset 0xfa820. Such a large offset seems unreasonable, the file is pure zeros between 0x48600 and 0xfa820 and no segment uses that range. Can you perhaps offer any insight?
Thanks,
Feri.

@gao-yan
Copy link
Member

gao-yan commented Apr 20, 2018

BTW, of course we already have this linker script to keep these symbols "GLOBAL DEFAULT" , which is relied on by the current code I believe. Otherwise the resulting "GLOBAL PROTECTED" symbols generated by the new binutils won't work.

My colleague, Michael Matz, who works on binutils reviewed the relevant code of libqb, and is pointing out:

No idea, why this should have worked in the past but I think this is a genuine
bug in the init function of the logger.

Basically it wants to go through all loaded shared objects, looks for
__start___verbose/__stop___verbose symbols in each of them and then initializes
the descriptors between both addresses.  The symbol lookup is done via dlsym.

Now, that is all fine and dandy in principle, but the callback that collects
all loaded objects (via dl_iterate_phdr) is:

static int32_t
_log_so_walk_callback(struct dl_phdr_info *info, size_t size, void *data)
{
        struct dlname *dlname;

        if (strlen(info->dlpi_name) > 0) {
                dlname = calloc(1, sizeof(struct dlname));
                if (!dlname)
                        return 0;
                dlname->dln_name = strdup(info->dlpi_name);
                qb_list_add_tail(&dlname->list, &dlnames);
        }

        return 0;
}

(in libqb/lib/log.c).  Not the check for strlen(info->dlpi_name) > 0.
That ignores the shared object created for the executable itself (it always
has a empty name).  If also find not other code which would try to enumerate
the __verbose section from the executables itself.

That matches the effects of the testsuite: some log messages (from libqb itself
and some other shared libs) are emitted just fine.  In addition the messages from crm_debug and crm_trace are also registered because they use a different
mechanism (not via the __verbose sections).  But the normal log messages
from the executable (e.g. from pacemaker/fencing/commands.c:926) are missing
as they aren't registered.

What needs to happen in the log initializer is that also the symbols of the main
object are handled.  For this you'd normally use "dlopen(NULL, flags)".
I bet extending qb_log_fini (or it's subroutine _log_so_walk_dlnames) to not
forget the main object will make everything work.

Could this be an approach that we'd still like to investigate?

@gao-yan
Copy link
Member

gao-yan commented Apr 23, 2018

Nice Michael Matz [email protected] kindly offered this patch, which passes the tests with or without this pull request #266 by using binutils with the patches back-ported from binutils-2.30.

diff --git a/lib/log.c b/lib/log.c
index 1339f91..16b54e8 100644
--- a/lib/log.c
+++ b/lib/log.c
@@ -793,44 +793,50 @@ _log_so_walk_callback(struct dl_phdr_info *info, size_t size, void *data)
 }
 
 static void
+_log_register_one(const char *dlname)
+{
+	void *handle, *start, *stop;
+	const char *error;
+
+	handle = dlopen(dlname, RTLD_LAZY);
+	error = dlerror();
+	if (!handle || error) {
+		qb_log(LOG_ERR, "%s", error);
+		goto done;
+	}
+
+	start = dlsym(handle, QB_ATTR_SECTION_START_STR);
+	error = dlerror();
+	if (error) {
+		goto done;
+	}
+
+	stop = dlsym(handle, QB_ATTR_SECTION_STOP_STR);
+	error = dlerror();
+	if (error) {
+		goto done;
+
+	} else {
+		qb_log_callsites_register(start, stop);
+	}
+done:
+	if (handle)
+		dlclose(handle);
+}
+
+static void
 _log_so_walk_dlnames(void)
 {
 	struct dlname *dlname;
 	struct qb_list_head *iter;
 	struct qb_list_head *next;
 
-	void *handle;
-	void *start;
-	void *stop;
-	const char *error;
-
+	_log_register_one(NULL);
 	qb_list_for_each_safe(iter, next, &dlnames) {
 		dlname = qb_list_entry(iter, struct dlname, list);
 
-		handle = dlopen(dlname->dln_name, RTLD_LAZY);
-		error = dlerror();
-		if (!handle || error) {
-			qb_log(LOG_ERR, "%s", error);
-			goto done;
-		}
+		_log_register_one(dlname->dln_name);
 
-		start = dlsym(handle, QB_ATTR_SECTION_START_STR);
-		error = dlerror();
-		if (error) {
-			goto done;
-		}
-
-		stop = dlsym(handle, QB_ATTR_SECTION_STOP_STR);
-		error = dlerror();
-		if (error) {
-			goto done;
-
-		} else {
-			qb_log_callsites_register(start, stop);
-		}
-done:
-		if (handle)
-			dlclose(handle);
 		qb_list_del(iter);
 		if (dlname->dln_name)
 			free(dlname->dln_name);

@gao-yan
Copy link
Member

gao-yan commented Apr 23, 2018

The revisions of binutils that matter:

commit 8dfb7cbf8401be97077f5919ac7473bdbfa8b692
Author: H.J. Lu <[email protected]>
Date:   Tue Aug 22 09:41:21 2017 -0700

    Update PR ld/21964 tests

commit 32253bb7963ac7caa166ec41e336372f2ffc03d4
Author: Alan Modra <[email protected]>
Date:   Tue Jan 23 10:50:02 2018 +1030

    Define __start/__stop symbols when there is only a dynamic def

commit 36b8fda5d614cb5aaf701a92befa9919bd0b195a
Author: Alan Modra <[email protected]>
Date:   Mon Jan 29 21:45:09 2018 +1030

    Make __start/__stop symbols dynamic and add testcase

commit 823143c6ca8ef4267e67ba03771991e08d09fabd
Author: H.J. Lu <[email protected]>
Date:   Wed Jan 31 05:10:40 2018 -0800

    Check if __start/__stop symbols are referenced by shared objects

commit bf3077a6c3c9ff21c072a6f42c91bffefd35bc15
Author: Michael Matz <[email protected]>
Date:   Wed Jan 31 14:26:46 2018 +0100

    bfd_elf_define_start_stop: Fix check

@jnpkrn
Copy link
Contributor Author

jnpkrn commented Apr 23, 2018

Thanks for the feedback, I need to dive into this once more.

I paid no attention to the size effect on the resulting binaries,
perhaps my bad.

I had an impression that the executable's own section will
be fed into the mechanism implicitly. Are you able to
reproduce the counterexample solely with
tests/functional/log_test_mock.sh run from within
Fedora (27 or so) VM with libqb's checkout?

@gao-yan
Copy link
Member

gao-yan commented Apr 23, 2018

BTW, this is the comment that Michael Matz made when proposing the patch:

My theory why it worked before
at all is that the __start___verbose symbols were not PROTECTED but DEFAULT (i.e. global), so looking up that symbol for a shared lib that didn't contain
those symbols (there are many) actually found the syms from the main object.
Now that they are PROTECTED that doesn't happen anymore.  So the old
binutils behaviour hid this bug, but now doesn't anymore.

@wferi
Copy link
Contributor

wferi commented Apr 23, 2018

@jnpkrn, I've got no Fedora VMs handy. Furthermore, I haven't done any testing yet, just noticed the changed section layout which made me think you might have relevant info to share before I start rabbit-holing that fatty binary... I'm not even sure libqb has anything to do with this, it's just a guess at the moment. But I'll start collecting hard data eventually.

@gao-yan
Copy link
Member

gao-yan commented Apr 25, 2018

Oh, this reminds me that Machael Matz once mentioned:

I'll also note that the trick you're using to work-around the binutils problem
(i.e. the linker script enforcing a __verbose section with boundary symbols) has
the side-effect of allocating a new segment for that section, which due to
alignment reasons is placed very far away from the existing ones.  That's not
a problem in itself, it just costs a bit memory, but I though to mention it
anyway.  It might be possible to play with the linker script to fold that
__verbose section into the normal data segment.

@wferi
Copy link
Contributor

wferi commented Apr 25, 2018

@gao-yan, that must be it, thanks for sharing! Michael Matz obviously knows his stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants