Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module libraries and libcore built as shared libs #191

Merged
merged 16 commits into from
May 19, 2015

Conversation

trws
Copy link
Member

@trws trws commented May 18, 2015

This is a partial fix for #179, dependency resolution should be fixed up more reliably in the build system, and installation paths need to be dealt with for the shared libraries. Some more testing is also in order.

What this does include is that the libflux-core library no longer includes any module libraries, and is itself built as a shared library that the modules link in.

@garlick garlick added the review label May 18, 2015
@trws
Copy link
Member Author

trws commented May 18, 2015

This last commit, assuming travis agrees with my tests, should have things in a mergeable state. It could use further testing, installed vs non for example, and a thorough review, but everything that should be linking via shared library is now I think. The odd bit that remains is dealing with modules that need more of the API than is exposed by flux-core. There isn't a lot of that, but the kvs for example depends on several symbols from libutil.

trws and others added 6 commits May 18, 2015 15:20
libcore is now built in src/common, the header has also moved
libmrpc is now built in src/modules/libmrpc, it can not be built anywhere
else, at all.  It must be built between the kvs and the modctl modules.  How
precisely we avoided this before... all I can think is that the modctl lib
must have been assuming the symbol would be available at runtime, and it
was, so it worked.  Now that it's building shared only, it checks, and if
libmrpc is built in any other order, it fails.

This seems almost ready for merge, pending review, except for one problem:
PROBLEM: flux keygen fails

make check passes all tests, but fails on keygen with either:
    flux-keygen: mkdir
    /g/g12/scogland/projects/flux/build/src/cmd/.libs/../../etc/flux: No such file
    or directory
or:
    flux: `keygen' is not a flux command.  See 'flux --help'

The first happens when an install path is set with configure, the second when
it isn't.
Both the kvs and core libraries are now built with flags that require all of
their symbols to resolve at link time for them rather than load time.  This
should probably be a requirement on every shared library we build, if only for
debugging purposes.  The other main change is that the flux command now
detects when it is being invoked by a libtool run script, by checking the path
for containing a .libs component, and adds an extra '../' in that case.  This
was the cause of the keygen error from the previous commit.
…g up on travis that are being missed on hype
@trws trws force-pushed the shared-library-core branch from b17d072 to c00ee06 Compare May 18, 2015 22:21
@grondo
Copy link
Contributor

grondo commented May 18, 2015

This looks like a really great direction to me. Not sure why tests are failing in Travis... If you want I can help look into that tomorrow.

Also, I might have missed this but are flux- commands to be built against the libflux-*.so shared libraries, or are they still using static convenience libs for now?

All module libraries now require all symbols at link time, or they will
immediately error out and tell you what's missing.

KAP now builds correctly under check again, the new flux_event_send and recv
prototypes broke it somehow, despite passing their own distcheck.
@trws
Copy link
Member Author

trws commented May 18, 2015

I found out what some of the issues are, they just keep cascading into
new ones. Most recent one is that the message update PR that just went
in actually breaks KAP, despite passing its own travis tests..., which
kills check here. Also several of the modules were not linking all
necessary symbols at .so build time, passing on check on hype then
failing on travis. All of that is resolved in the most recent commit,
so hopefully that addresses it.

On 18 May 2015, at 15:53, Mark Grondona wrote:

This looks like a really great direction to me. Not sure why tests are
failing in Travis... If you want I can help look into that tomorrow.

Also, I might have missed this but are flux- commands to be built
against the libflux-*.so shared libraries, or are they still using
static convenience libs for now?


Reply to this email directly or view it on GitHub:
#191 (comment)

@garlick
Copy link
Member

garlick commented May 18, 2015

@trws, sorry about breaking KAP! It builds only if HAVE_MPI is set, and it's not set in my personal environment, and likely isn't set for travis builds either.

@trws
Copy link
Member Author

trws commented May 18, 2015

@garlick No worries, it was a simple enough fix at least. Getting everything to link on the travis machine is proving not to be however. Do you happen to know what gcc version we're running on there? Something is different in the way symbols are getting resolved on hype and on the build bot. All of the errors I'm getting are legitimate, but having it only fail on travis makes the debug loop pretty long.

@garlick
Copy link
Member

garlick commented May 18, 2015

It's displayed early in the build log:

$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

@garlick
Copy link
Member

garlick commented May 18, 2015

If you can get a Ubuntu vm going (virtualbox on your mac?), you may be able to work in an environment closer to the travis one. See .travis.yml for a recipe of sorts for building a similar environment.

Thomas R. W. Scogland added 4 commits May 18, 2015 16:29
There must be a better way to propagate these in automake.   CMake does this
automatically.
…ransitive dependencies for this to be practical
@grondo
Copy link
Contributor

grondo commented May 19, 2015

I used a Dockerfile to build the original .travis.yml in order to avoid long debug loop.
(Unfortunately we don't have anywhere convenient to run docker on our systems, but maybe we can set something up somewhere.)

This Dockerfile might need some slight tweaking as it was from awhile ago:

FROM ubuntu:14.10
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get -y install \
        lua5.2 liblua5.2-dev lua-posix munge \
        libmunge-dev libjson-c-dev \
        libtool automake autoconf \
        aspell wget g++ make git uuid-dev
RUN useradd -m ubuntu
RUN for url in \
        https://download.libsodium.org/libsodium/releases/libsodium-1.0.0.tar.gz \
        http://download.zeromq.org/zeromq-4.1.0-rc1.tar.gz \
        http://download.zeromq.org/czmq-3.0.0-rc1.tar.gz ; do \
                f=`basename $url .tar.gz` && \
                wget ${url} && \
                tar xvfz ${f}.tar.gz && \
                (cd $(basename $f -rc1) && ./configure && make install); \
        done

RUN git clone https://github.com/grondo/lua-hostlist && \
     cd lua-hostlist && \
     make LUA_VER=5.2 && \
     make install LUA_VER=5.2
RUN apt-get -y install vim strace
RUN ldconfig
USER ubuntu
WORKDIR /home/ubuntu
RUN git clone --depth=50 --branch=master https://github.com/flux-framework/flux-core && cd flux-core && ./autogen.sh && ./configure && make check

@grondo
Copy link
Contributor

grondo commented May 19, 2015

Oops, the above Dockerfile isn't going to work -- it was used for testing with lua-5.2. The travis CI environment is actually ubuntu 12.04LTS. Let me try to dig up the right Dockerfile this time.

@trws
Copy link
Member Author

trws commented May 19, 2015

At least it's encouraging that this last version works correctly with clang. The only holdout for the gcc build seems to be missing symbols in the lua bindings, so hopefully this will be the last round anyway.

The only thing I'm wondering is if it might be worth turning all of this into a flux-internal shared library, so it will pull in the symbols it needs at linking rather than manually propagating them everywhere. Or maybe there's a better way to do this in Automake, I don't know of one, but CMake for example will track the library dependencies of a static library and apply them where that library is used. Do either of you know a good way to do that across directories for Automake?

@grondo
Copy link
Contributor

grondo commented May 19, 2015

Libtool does support some sort of inter-library dependencies. I thought automake was smart enough to do this if you linked a static convenience library with another?

https://sourceware.org/autobook/autobook/autobook_77.html#SEC77

Thomas R. W. Scogland added 4 commits May 19, 2015 09:58
As Mark pointed out, using the libtool setup will track the necessary extra
link options, so I have switched to that.  Some of the builds now specify
significantly more than they need to work, but it seems to have taken care of
the issue.
This attempts to fix the parallel make check, but does not entirely succeed.
Even with the .NOTPARALLEL: on the bottom dirs, automake still manages to try
and build the KVS module before common is done building.
@grondo
Copy link
Contributor

grondo commented May 19, 2015

@trws, not sure if this is helpful but here's a Dockerfile that more closely matches travis-ci.org build environment:
(EDIT: LuaBitOp was unnecessary)

FROM ubuntu:12.04
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get -y install \
    lua5.1 liblua5.1-dev munge \
    libmunge-dev \
    libtool automake autoconf \
        aspell wget g++ make git uuid-dev \
    luarocks
RUN useradd -m ubuntu
RUN for url in \
    https://download.libsodium.org/libsodium/releases/libsodium-1.0.0.tar.gz \
    http://download.zeromq.org/zeromq-4.1.0-rc1.tar.gz \
        https://s3.amazonaws.com/json-c_releases/releases/json-c-0.11.tar.gz \
    http://download.zeromq.org/czmq-3.0.0-rc1.tar.gz ; do \
        f=`basename $url .tar.gz` && \
        wget ${url} && \
        tar xvfz ${f}.tar.gz && \
        (cd $(basename $f -rc1) && ./configure && make install); \
    done
RUN luarocks install luaposix
RUN git clone https://github.com/grondo/lua-hostlist && \
     cd lua-hostlist && \
     make LUA_VER=5.1 && \
     make install LUA_VER=5.1
RUN apt-get -y install vim strace
RUN ldconfig
USER ubuntu
WORKDIR /home/ubuntu
RUN git clone --depth=50 --branch=master https://github.com/flux-framework/flux-core && cd flux-core && ./autogen.sh && ./configure && make clean

@grondo
Copy link
Contributor

grondo commented May 19, 2015

Nicely done, looks like this PR is passing now.
Was there anymore TODO here besides fixing parallel build? IMO the work here is good enough to merge now and fixup any other items later.

@garlick
Copy link
Member

garlick commented May 19, 2015

Yes looks good!

grondo added a commit that referenced this pull request May 19, 2015
module libraries and libcore built as shared libs
@grondo grondo merged commit acd42ce into flux-framework:master May 19, 2015
@grondo grondo removed the review label May 19, 2015
@trws
Copy link
Member Author

trws commented May 19, 2015

Thanks for the review, and the dockerfile, may look at getting boot2docker working... anyway, I have one more update for this that splits the exports such that the flux_* symbols only exist in the libflux-core library, should help make sure that at least the API symbols are only included in the binaries once. I'll put in another pull request for that momentarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants