Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The v1.10.0 release package can't build on Power8 LE #922

Closed
Zhiming-Wang opened this issue Sep 22, 2015 · 11 comments
Closed

The v1.10.0 release package can't build on Power8 LE #922

Zhiming-Wang opened this issue Sep 22, 2015 · 11 comments
Labels
Milestone

Comments

@Zhiming-Wang
Copy link
Member

bot:milestone:v1.10.0
bot:label:bug
bot:assign: @jsquyres

I downloaded the v1.10.0 release package from http://www.open-mpi.org/software/ompi/v1.10/downloads/openmpi-1.10.0.tar.gz and built it by xlc.
The building was failed on RHEL 7.1 Power8 LE.
The steps:

$ ./configure --prefix=/home/wangzm/ompi-110rel-xlc-opt-p8le ... && make all install
...
make[2]: Entering directory /tmp/tmp-dir/openmpi-1.10.0/ompi/mpi/fortran/use-mpi-ignore-tkr' FCLD libmpi_usempi_ignore_tkr.la /lib64/librt.so: could not read symbols: File in wrong format make[2]: *** [libmpi_usempi_ignore_tkr.la] Error 1 make[2]: Leaving directory/tmp/tmp-dir/openmpi-1.10.0/ompi/mpi/fortran/use-mpi-ignore-tkr'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/tmp/tmp-dir/openmpi-1.10.0/ompi'
make: *** [all-recursive] Error 1
$ cd /tmp/tmp-dir/openmpi-1.10.0/ompi/mpi/fortran/use-mpi-ignore-tkr
$ make -n
echo " FCLD " libmpi_usempi_ignore_tkr.la;/bin/sh ../../../../libtool --silent --tag=FC --mode=link xlf -I../../../../ompi/include -I../../../../ompi/include -I../../../.. -I../../../.. -q64 -qarch=pwr8 -qtune=pwr8 -O3 -qxflag=nseq -version-info 6:0:0 -o libmpi_usempi_ignore_tkr.la -rpath /usr/local/lib mpi-ignore-tkr.lo mpi-ignore-tkr-sizeof.lo -lrt -lutil
$ /bin/sh -x ../../../../libtool --silent --tag=FC --mode=link xlf -I../../../../ompi/include -I../../../../ompi/include -I../../../.. -I../../../.. -q64 -qarch=pwr8 -qtune=pwr8 -O3 -qxflag=nseq -version-info 6:0:0 -o libmpi_usempi_ignore_tkr.la -rpath /usr/local/lib mpi-ignore-tkr.lo mpi-ignore-tkr-sizeof.lo -lrt -lutil
...
++ cmd='/usr/bin/ld -m elf64ppc -shared .libs/mpi-ignore-tkr.o .libs/mpi-ignore-tkr-sizeof.o -lrt -lutil -soname libmpi_usempi_ignore_tkr.so.6 -o .libs/libmpi_usempi_ignore_tkr.so.6.0.0'

    :
    false
    eval '/usr/bin/ld -m elf64ppc -shared .libs/mpi-ignore-tkr.o .libs/mpi-ignore-tkr-sizeof.o -lrt -lutil -soname libmpi_usempi_ignore_tkr.so.6 -o .libs/libmpi_usempi_ignore_tkr.so.6.0.0' ++ /usr/bin/ld -m elf64ppc -shared .libs/mpi-ignore-tkr.o .libs/mpi-ignore-tkr-sizeof.o -lrt -lutil -soname libmpi_usempi_ignore_tkr.so.6 -o .libs/libmpi_usempi_ignore_tkr.so.6.0.0 /lib64/librt.so: could not read symbols: File in wrong format
    lt_exit=1
    test link = relink
    exit 1

As shown above, the "elf64ppc" option was used for "/usr/bin/ld". At Power8 LE, the "elf64lppc" should be used. I reran "./autogen.pl" at Power8 LE, the building was passed.
I don't know how was "configure" file generated at release package. I think the Power BE and LE are not distinguished somewhere, so the "elf64ppc" option for Power BE is misapplied on Power LE platform.

@ggouaillardet
Copy link
Contributor

iirc, v1.10 tarballs are generated using libtool 2.4.2, and tarballs are generated using libtool 2.4.6, and libtool 2.4.2 does not (fully ?) support Power LE.
we discussed similar topic in the past and concluded it is a bad idea to change libtool and friends in the middle of a stable serie (short story, v1.10 is more a fork of v1.8 than a new branch)
so i am afraid this might not be fixed until v2.0.0.
@jsquyres @rhc54 could you please comment on that ?

if you have some time, could you download and build openmpi from a v2.x nightly snapshot and confirm there is no such issue on Power BE ?

@nysal
Copy link
Member

nysal commented Sep 22, 2015

@Zhiming-Wang see #396
Fixing this will require patching the configure script once it is generated. Do you want to take a shot at it?

@ggouaillardet
Copy link
Contributor

@nysal we already patch the configure script to fix some libtool bugs, so this is something we can consider.
btw, is the patch for libtool 2.4.2 only ? 2.4.6 only ? any version ?
thanks in advance for attaching the patch

@nysal
Copy link
Member

nysal commented Sep 22, 2015

@ggouaillardet The patch is for libtool < 2.4.3 (thats when the patch landed in libtool). I have really tested it on 2.4.2 only though.

@jsquyres
Copy link
Member

Have a look in autogen.pl to see where we apply patches after configure is generated.

@Zhiming-Wang Zhiming-Wang added this to the Open MPI v1.10.1 milestone Sep 23, 2015
@Zhiming-Wang
Copy link
Member Author

Both nightly v2.x and master package were passed.
But I have a question, why it was passed if I reran the "./autogen.pl"?

@ggouaillardet
Copy link
Contributor

configure and friends for v1.1.10 was generated with an old libtool chain that does not support power LE.
if you run autogen.pl, you will regenerate configure and friends with your local libtool chain that does support power LE.
makes sense ?

@jsquyres
Copy link
Member

@Zhiming-Wang What @ggouaillardet said is correct. See also the table on http://www.open-mpi.org/source/building.php for what versions of the GNU Autotools we use to build each release series. We fix these versions when we create a series and try very very hard not to update them throughout the life of the series.

@Zhiming-Wang
Copy link
Member Author

I am back from a long long vacation.
Thanks, @ggouaillardet and @jsquyres .

I reran the "./autogen.pl" at x86_86 machine with RHEL 7.1, the version of libtool is 2.4.2 which is the same as which use to build v1.10.x. The issue still remains.
But at Power8 LE machine with the same version of OS and libtool, the issue was disappeared.
Is the bug of libtool fixed at RHEL 7.1 installation package for PPC LE? Maybe, I don't think we need to care about the details.
Thanks again.

@jsquyres
Copy link
Member

@Zhiming-Wang Welcome back. So I'm not quite clear -- does the v1.10.x distribution tarball still have the issue or not? If so, do you want/need a fix for the distribution tarball? If so, you'll need to apply a patch to configure -- see autogen.pl for where we add patches for this kind of thing.

@gpaulsen
Copy link
Member

This is a duplicate of Issue #396. @nysal has a fix that just needs to be applied to the 1.10 branch. I will ask @Zhiming-Wang if he feels comfortable applying that fix to the 1.10 stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants