-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround issues with ld from binutils 2.29 #266
Conversation
01c4700
to
c299e88
Compare
lib/Makefile.am
Outdated
{ echo "only applicable to SO symlink scheme"; exit 1; }; \ | ||
echo "INPUT($${t1_bn})" > "$(DESTDIR)$(libdir)/libqb.so-t"; \ | ||
cat $< >> "$(DESTDIR)$(libdir)/libqb.so-t"; \ | ||
mv -f "$(DESTDIR)$(libdir)/libqb.so-t" "$(DESTDIR)$(libdir)/libqb.so" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks horrible. Can it be tidied up or put in a script somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps (look at libtool itself to see some really hard to read shell code).
But what's worse, 12 lines of well-wrapped (and overall commented code),
or having to distribute yet another single-use-only file? I am biased towards
the latter :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it gets a lot tidier when you observe make install
screen :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually able to separate last 2 lines, leaving the "blob" down to 10 :)
ecde2ab
to
7a953ef
Compare
Still rather WIP, but the most difficult part (making libtool actually serve |
7a953ef
to
7bf1627
Compare
Ok, it now passes the CI run. Another yet unexplored field is how this all combine |
7bf1627
to
eb3bde0
Compare
That's definitely not desired: Beside fixing that, |
eb3bde0
to
10cd5db
Compare
10cd5db
to
b126500
Compare
b126500
to
1854db8
Compare
Hopefully not many more iterations of this patchset is needed :) |
Strange, looks like |
OK, the issue is mainly that older libtool and gcc are not so smart |
Gotcha :-/ |
6e7ed7b
to
e683356
Compare
Testing on some other platforms, for FreeBSD, I need to at least respell the |
e683356
to
e393994
Compare
Fixed the sed command for portability and also bumped the |
IOW, the reviewer should be obliged to spot "public ABI" changes and |
d64b314
to
c011b12
Compare
Chrissie, you can spot the gist of wording changes at I also demoted one of the new non-fatal warnings in |
Now merged. Thanks again for all your work on this issue |
Some after the fact notes:
|
Does the fix in binutils-2.29.1 provide any practical help for this issue? It sounds like basically libqb has to do the exactly same for binutils-2.29.1 as for binutils-2.29? |
And there seems to be some issue with this combination:
-- This is on openSUSE 42.3. With the same binutils and other package versions on SLE12 SP3/SP2, pacemaker encounters the same assertion with "crm_shadow --help" on build. |
On 11/01/18 07:29 +0000, Gao,Yan wrote:
Does the fix in binutils-2.29.1 provide any practical help for this
issue? It sounds like basically libqb has to do the exactly same for
binutils-2.29.1 as for binutils-2.29?
I was also interested back when 2.29.1 arrived as I was curious
whether the time invested to deal with 2.29 was worth it would it be
just for the only linker versions, but it turned up that nope,
private->protected shift in symbols visibility between 2.29 and 2.29.1
is not enough, as original libqb counted on public one (it's my
understanding only this models situation with static linking the best,
though never tested, because such practice is dedicated to very
special cases nowadays).
And yes, part of the point is that for "build-time link with libqb",
there needs to be rather unified handling for whatever (supported)
linker will be used (otherwise LDFLAGS would need to look like
$(qbconfig --libs) for the dependent components, where qbconfig would
internally guestimate [unreliably!] which linker will be used and
arranged the actual flags accordingly, which looks pretty cumbersome),
which is exactly the current state -- the overlay linker script will
not hurt linker from binutils < 2.29 either, and keeps log module
working well with >= 2.29.
As I mentioned in the recent Fedora conversation:
https://lists.fedoraproject.org/archives/list/[email protected]/message/UIACQTZZYIS627AUREOOOPU5MFGC3HSW/
I was originally unaware about the separate upstream discussion,
outcome of which was that I had to be more strict in configure checks
to capture 2.29.1 in addition to 2.29 where the former iteration of
them worked well.
Hope this helps.
…--
Poki
|
On 11/01/18 07:58 +0000, Gao,Yan wrote:
And there seems to be some issue with this combination:
1) libqb-1.0.3+20171226.6d62b64 built with binutils-2.26.1:
checking whether GCC supports __attribute__((section()) + ld supports orphan sections... yes
checking whether linker emits global boundary symbols for orphan sections... yes
2) And pacemaker-1.1.18+20180104.7ba28d854 fails to build with this libqb:
```
PATH=/home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools:$PATH /home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools/iso8601 --help
...
iso8601: utils.c:66: common: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.
Makefile:1469: recipe for target 'iso8601.8' failed
gmake[1]: *** [iso8601.8] Aborted (core dumped)
```
-- This is on openSUSE 42.3.
With the same binutils and other package versions on SLE12 SP3/SP2, "crm_shadow --help" triggers the same assertion on build.
I've observed various issues in the past including something like
this (failures when building pacemaker), let me mimic your combination
and I'll get back to you.
…--
Poki
|
On 11/01/18 09:55 +0100, Jan Pokorný wrote:
On 11/01/18 07:58 +0000, Gao,Yan wrote:
> And there seems to be some issue with this combination:
>
> 1) libqb-1.0.3+20171226.6d62b64 built with binutils-2.26.1:
> checking whether GCC supports __attribute__((section()) + ld supports orphan sections... yes
> checking whether linker emits global boundary symbols for orphan sections... yes
>
> 2) And pacemaker-1.1.18+20180104.7ba28d854 fails to build with this libqb:
> ```
> PATH=/home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools:$PATH /home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools/iso8601 --help
> ...
> iso8601: utils.c:66: common: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.
> Makefile:1469: recipe for target 'iso8601.8' failed
> gmake[1]: *** [iso8601.8] Aborted (core dumped)
> ```
> -- This is on openSUSE 42.3.
>
> With the same binutils and other package versions on SLE12 SP3/SP2, "crm_shadow --help" triggers the same assertion on build.
I've observed various issues in the past including something like
this (failures when building pacemaker), let me mimic your combination
and I'll get back to you.
Reproduced with (and with both 2.28 and 2.29, actually):
./log_test_mock.sh -ncl -tqb+_il+_c+ ../../libqb-1.0.3-1.1.6d62.*.src.rpm
(see also #288 where I hope
to eventually develop a solution to this fallout).
…--
Jan (Poki)
|
On 11/01/18 21:12 +0100, Jan Pokorný wrote:
On 11/01/18 09:55 +0100, Jan Pokorný wrote:
> On 11/01/18 07:58 +0000, Gao,Yan wrote:
>> And there seems to be some issue with this combination:
>>
>> 1) libqb-1.0.3+20171226.6d62b64 built with binutils-2.26.1:
>> checking whether GCC supports __attribute__((section()) + ld supports orphan sections... yes
>> checking whether linker emits global boundary symbols for orphan sections... yes
>>
>> 2) And pacemaker-1.1.18+20180104.7ba28d854 fails to build with this libqb:
>> ```
>> PATH=/home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools:$PATH /home/abuild/rpmbuild/BUILD/pacemaker-1.1.18+20180104.7ba28d854/tools/iso8601 --help
>> ...
>> iso8601: utils.c:66: common: Assertion `"implicit callsite section is populated, otherwise target's build is at fault, preventing reliable logging" && __start___verbose != __stop___verbose' failed.
>> Makefile:1469: recipe for target 'iso8601.8' failed
>> gmake[1]: *** [iso8601.8] Aborted (core dumped)
>> ```
>> -- This is on openSUSE 42.3.
>>
>> With the same binutils and other package versions on SLE12 SP3/SP2, "crm_shadow --help" triggers the same assertion on build.
>
> I've observed various issues in the past including something like
> this (failures when building pacemaker), let me mimic your combination
> and I'll get back to you.
Reproduced with (and with both 2.28 and 2.29, actually):
./log_test_mock.sh -ncl -tqb+_il+_c+ ../../libqb-1.0.3-1.1.6d62.*.src.rpm
(see also #288 where I hope
to eventually develop a solution to this fallout).
Ok, I figured why I haven't observed any issue like that e.g. when
building pacemaker just like you, except I did it for Fedora.
As mentioned, I observe the problem manifested when mimicking the same
with "./log_test_mock.sh -ncl" test runner, unless the advice bellow
is applied.
The crucial point Fedora does is that it injects "-Wl,-z,relro" to
%__global_ldflags RPM macro which is then used in %configure one,
which is in turn used in pacemaker.spec.
So as an interim solution, try a line like this at the top of
pacemaker.spec (untested, just that you grok the idea):
%define __global_ldflags %{expand:%__global_ldflags} -Wl,-z,relro
Build process should then pass OK and the resulting binaries should
work without an issue.
…--
Jan (Poki)
|
Anyway, another solution would be fix the terrifying mess representing
current shape of pacemaker's linking declarations, another technical
debt impeding the code base sanity. For instance, it's a nonsense to
link test.iso8601.o with libqb directly, to begin with.
Note that to avoid a transitive transfer of libraries to link with
from immediate libtool libraries (*.la), which would apparently be
needed so as to avoid CLI tools that don't use libqb directly not
be linked with it like that (good), one would also need to define
"link_all_deplibs=no" in configure script (this is also Debian/Ubuntu
default).
However, it would not solve the problem completely, as there may be
cases, e.g., amongst daemons, where libqb is used directly but no
logging message is arranged in the immediate code of that program.
Hence, using above link flag as an interim solution is more appealing,
till some more systemic tweak is discovered.
…--
Jan (Poki)
|
On 12/01/18 00:31 +0100, Jan Pokorný wrote:
On 11/01/18 21:12 +0100, Jan Pokorný wrote:
> Reproduced with (and with both 2.28 and 2.29, actually):
> ./log_test_mock.sh -ncl -tqb+_il+_c+ ../../libqb-1.0.3-1.1.6d62.*.src.rpm
(note, for 2.29* versions, I am using qb-_il-_c- variant depending on
how pkg_binutils_229 variable is set)
So actually the only binutils version with which the test passes is
the newest 2.29.1 it seems.
> (see also #288 where I hope
> to eventually develop a solution to this fallout).
Ok, I figured why I haven't observed any issue like that e.g. when
building pacemaker just like you, except I did it for Fedora.
Figured out that the libqb build present in the buildroot for this
pacemaker build I had in mind didn't have the offending assertion
present in qblog.h:
https://koji.fedoraproject.org/koji/buildinfo?buildID=986447
which explains the former success.
As mentioned, I observe the problem manifested when mimicking the same
with "./log_test_mock.sh -ncl" test runner, unless the advice bellow
is applied.
Scratch the following incorrect conclusion based on losing -DNLOG CFLAG
when rebuilding log_interlib_client directly in the chroot (it's so
easy to get something wrong in these multi-side scenarios when each
side may need a special treatment):
The crucial point Fedora does is that it injects "-Wl,-z,relro" to
%__global_ldflags RPM macro which is then used in %configure one,
which is in turn used in pacemaker.spec.
So as an interim solution, try a line like this at the top of
pacemaker.spec (untested, just that you grok the idea):
%define __global_ldflags %{expand:%__global_ldflags} -Wl,-z,relro
Build process should then pass OK and the resulting binaries should
work without an issue.
^ that was simply wrong
I'll investigate further whether dropping the offending assertion
again is the only way forward (still may have and added value for
2.29.1+ linked cases if we are able to detect that by any means).
Other interim solution would be to make non-logging executables
linked directly (as in the link command) rather linked with
-l:libqb.so.0 (bypassing the linker script in libqb.so) but that
seems quite fragile (I intentionally dropped the idea we would
have extra library name like libqb-nolog.so [-lqb-nolog] as that
would possibly add needless maintenance burden going forward)
and would collide with automatic libtool's link flags propagation
mentioned in the former comment.
…--
Jan (Poki)
|
It turned out that while log_test_mock.sh already provided ways for extra variations in the composite body of the program to inspect for a working logging, "omit logging at the target client side completely, leave it along with the respective self-check just at the intermediate library this program uses" one hadn't apparently been exercised with binutils < 2.29.1 (which is the only one OK from the get-go since it arranges boundary denoting symbols for orphan sections as protected). This made the pacemaker building fail half the way with libqb 1.0.3 and binutils < 2.29.1 because the built executables are instantly run as to extract their help screen[1], and they represent the said pain pattern: - program not using libqb log subsystem directly, but - linked to dynamic library that itself does + it also utilizes QB_LOG_INIT_DATA macro, which - upon loading the executable and the the linked dependencies prior to proper run invokes the checks, where - one in particular tries to assess whether the direct-access boundary symbols are not aliasing (equal), but - because with standard visibility, symbols from the program take a precedence, actually its symbols are prioritized (instead of the ones pertaining to the library layer as expected), and - since previous fix to accommodate binutils 2.29+ fancy/new handling of the boundary symbols they are not subject to any sort of garbage collection (no symbols were emitted for a section that was not known because there were no instruction it should contain anything, until we stuck those symbols in place by force by the means of linker script to that effect), these boundary symbols indeed occur in this end program having no callsite data (see first point), meaning that start and stop address indications for the section equals, hence - the said assertion triggers (despite it should not = false positive) This clearly means that "QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP" cannot be used unconditionally. However, it would left the software using logging along with QB_LOG_INIT_DATA macro (which was already highlighted as its vital part in order to avoid silent logging malfunction) that doesn't utilize _GNU_SOURCE macro (unlocking some GNU extensions on top of basic POSIX interface) short of the main checks (because they are conditionalized for requiring said extensions), and libqb never required the client side to define that macro nor it dared to define it behind user's back as it can have unexpected consequences. Luckily, the first paragraph carries the key: protected symbols behave exactly as required here[2]: > Protected visibility is like default visibility except that it > indicates that references within the defining module bind to the > definition in that module. That is, the declared entity cannot be > overridden by another module. So we just move said comparison to "!defined(_GNU_SOURCE)" conditional branch within qblog.h, and to cater binutils prior to 2.29.1 (which has it like that by default as mentioned), we also mark the declarations of the boundary symbols, likewise conditionally, with protected visilibity (there appears to be no way for that in the linker script). But that would be too easy as once again, 2.29 linker begs to differ and these two "!defined(_GNU_SOURCE)" measures will actually do more harm than good when in the mix. Hence, new QB_LD_2_29 macro is proclaimed the kill-switch for that case, and the user becomes responsible to either define it when building with this 2.29 troublemaker (as a recap, 2.29.1 is fine from this perspective), or to start using _GNU_SOURCE. One more question mark remains, though: are not the QB_LOG_INIT_DATA's checks in _GNU_SOURCE case weaker now than used to be? The anwer is: no, as long as looking at the symbols through dlopen'd particular shared object will contain no overrides from either already loaded dependencies (if caching is so aggressive to detect it's already in loaded in caller's context) or from recursive load. Currently, it doesn't seem to be the case, but it may depend on the implementation of dynamic linking library or the toolchain. If this observation is proved to be wrong, the solution may be as simple as dropping !defined(_GNU_C) condition from the guard of making the boundary symbols with protected visibility -- it is not done now also with respect to non-gcc compilers which may not recognize that. Last but not least, dl* calls in the QB_LOG_INIT_DATA's checks are themselves subject to scrutiny now: should they fail unexpectedly, the run is terminated just as well. This led to discovery of the issue masked so far, boiling down to unability do dlopen executable (as opposed to shared object)[3]. It did not matter before, because of rather best-effort, optimistic approach (perform the final check only if precondition steps succeeded), but now, we have to add an extra stipulation that this case won't lead to premature termination -- it just happens to sometimes be the case, and there's not much we can do to detect run down on the level of the executable proactively, at least not based on the brief experiments. [1] ClusterLabs#266 (comment) [2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-visibility-function-attribute [3] https://sourceware.org/bugzilla/show_bug.cgi?id=11754 Signed-off-by: Jan Pokorný <[email protected]>
It turned out that while log_test_mock.sh already provided ways for extra variations in the composite body of the program to inspect for a working logging, "omit logging at the target client side completely, leave it along with the respective self-check just at the intermediate library this program uses" one hadn't apparently been exercised with binutils < 2.29.1 (which is the only one OK from the get-go since it arranges boundary denoting symbols for orphan sections as protected). This made the pacemaker building fail half the way with libqb 1.0.3 and binutils < 2.29.1 because the built executables are instantly run as to extract their help screen[1], and they represent the said pain pattern: - program not using libqb log subsystem directly, but - linked to dynamic library that itself does + it also utilizes QB_LOG_INIT_DATA macro, which - upon loading the executable and the the linked dependencies prior to proper run invokes the checks, where - one in particular tries to assess whether the direct-access boundary symbols are not aliasing (equal), but - because with standard visibility, symbols from the program take a precedence, actually its symbols are prioritized (instead of the ones pertaining to the library layer as expected, remember that that check was intended as self-contained, targeting just it's own participating part in the link scheme), and - since previous fix to accommodate binutils 2.29+ fancy/new handling of the boundary symbols they are not subject to any sort of garbage collection (no symbols were emitted for a section that was not known because there were no instruction it should contain anything, until we stuck those symbols in place by force by the means of linker script to that effect), these boundary symbols indeed occur in this end program having no callsite data (see first point), meaning that start and stop address indications for the section equals, hence - the said assertion triggers (despite it should not = false positive), implying breach of self-containment of the check (which naturally cannot be responsible for any other linked part) This clearly means that "QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP" cannot be used unconditionally. However, it would left the software using logging along with QB_LOG_INIT_DATA macro (which was already highlighted as its vital part in order to avoid silent logging malfunction) that doesn't utilize _GNU_SOURCE macro (unlocking some GNU extensions on top of basic POSIX interface) short of the main checks (because they are conditionalized for requiring said extensions), and libqb never required the client side to define that macro nor it dared to define it behind user's back as it can have unexpected consequences. Luckily, the first paragraph carries the key: protected symbols behave exactly as required here[2]: > Protected visibility is like default visibility except that it > indicates that references within the defining module bind to the > definition in that module. That is, the declared entity cannot be > overridden by another module. So we just move said comparison to "!defined(_GNU_SOURCE)" conditional branch within qblog.h, and to cater binutils prior to 2.29.1 (which has it like that by default as mentioned), we also mark the declarations of the boundary symbols, likewise conditionally, with protected visilibity (there appears to be no way for that in the linker script). But that would be too easy as once again, 2.29 linker begs to differ and these two "!defined(_GNU_SOURCE)" measures will actually do more harm than good when in the mix. Hence, new QB_LD_2_29 macro is proclaimed the kill-switch for that case, and the user becomes responsible to either define it when building with this 2.29 troublemaker (as a recap, 2.29.1 is fine from this perspective), or to start using _GNU_SOURCE. One more question mark remains, though: are not the QB_LOG_INIT_DATA's checks in _GNU_SOURCE case weaker now than used to be? The anwer is: no, as long as looking at the symbols through dlopen'd particular shared object will contain no overrides from either already loaded dependencies (if caching is so aggressive to detect it's already in loaded in caller's context) or from recursive load. Currently, it doesn't seem to be the case, but it may depend on the implementation of dynamic linking library or the toolchain. If this observation is proved to be wrong, the solution may be as simple as dropping !defined(_GNU_C) condition from the guard of making the boundary symbols with protected visibility -- it is not done now also with respect to non-gcc compilers which may not recognize that. Last but not least, dl* calls in the QB_LOG_INIT_DATA's checks are themselves subject to scrutiny now: should they fail unexpectedly, the run is terminated just as well. This led to discovery of the issue masked so far, boiling down to unability to dlopen executable (as opposed to shared object)[3]. It did not matter before, because of rather best-effort, optimistic approach (perform the final check only if precondition steps succeeded), but now, we have to add an extra stipulation that this case won't lead to premature termination -- it just happens to sometimes be the case, and there's not much we can do to detect run down on the level of the executable proactively, at least not based on the brief experiments. [1] ClusterLabs#266 (comment) [2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-visibility-function-attribute [3] https://sourceware.org/bugzilla/show_bug.cgi?id=11754 Signed-off-by: Jan Pokorný <[email protected]>
FYI, there's a thread about this topic ongoing from binutils upstream: |
Thanks for the heads up. It looks to me that current approach, together with some rough edges If there's any intention to handle the mechanism at hand better on linker |
I can tell the difficulty of it... You did nice work fixing this though. Apparently binutils upstream realized the situation indeed needs to be improved and they are achieving a better solution now: https://sourceware.org/ml/binutils/2018-02/msg00056.html Actually more users are aware of that: I also hope the resulting solution won't break the current approach of libqb. |
Looking at https://sourceware.org/ml/binutils/2018-01/msg00347.html and figuring out that there have been more changes on the topic recently
I got slightly worried. Good news is that libqb's internal sanity check:
keeps working also with this built-from-checkout binutils version, |
@jnpkrn, while uploading Corosync 2.4.4 (built with libqb 1.0.3) to Debian unstable, I noticed that the size of the |
BTW, of course we already have this linker script to keep these symbols "GLOBAL DEFAULT" , which is relied on by the current code I believe. Otherwise the resulting "GLOBAL PROTECTED" symbols generated by the new binutils won't work. My colleague, Michael Matz, who works on binutils reviewed the relevant code of libqb, and is pointing out:
Could this be an approach that we'd still like to investigate? |
Nice Michael Matz [email protected] kindly offered this patch, which passes the tests with or without this pull request #266 by using binutils with the patches back-ported from binutils-2.30.
|
The revisions of binutils that matter:
|
Thanks for the feedback, I need to dive into this once more. I paid no attention to the size effect on the resulting binaries, I had an impression that the executable's own section will |
BTW, this is the comment that Michael Matz made when proposing the patch:
|
@jnpkrn, I've got no Fedora VMs handy. Furthermore, I haven't done any testing yet, just noticed the changed section layout which made me think you might have relevant info to share before I start rabbit-holing that fatty binary... I'm not even sure libqb has anything to do with this, it's just a guess at the moment. But I'll start collecting hard data eventually. |
Oh, this reminds me that Machael Matz once mentioned:
|
@gao-yan, that must be it, thanks for sharing! Michael Matz obviously knows his stuff. |
See #265.
WIP, do not merge yet, though feel free to comment.EDIT: should no longer be considered WIP, now subject to review.