Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Pack_external_size is returning the wrong values in 64-bit applications #10

Closed
ompiteam opened this issue Oct 1, 2014 · 12 comments
Closed
Assignees
Labels

Comments

@ompiteam
Copy link
Contributor

ompiteam commented Oct 1, 2014

We have a program that tests for the size returned from MPI_Pack_external_size with the external32 data representation. It should return the same value for both 32-bit and 64-bit applications, but it is returning different values.

 burl-ct-v40z-0 65 =>mpicc ext32.c -o ext32
"ext32.c", line 105: warning: shift count negative or too big: << 32
 burl-ct-v40z-0 66 =>mpirun -np 2 ext32
First test passed
Second test passed
Third test passed
ext32: PASSED
 burl-ct-v40z-0 67 =>mpicc -xarch=amd64 ext32.c -o ext32_amd64
 burl-ct-v40z-0 68 =>mpirun -np 2 ext32_amd64 
First test passed
Second test failed. Got size of 80, expected 40
Third test failed. Got size of 6400, expected 3200
[burl-ct-v40z-0:13864] *** An error occurred in MPI_Pack_external
[burl-ct-v40z-0:13864] *** on communicator MPI_COMM_WORLD
[burl-ct-v40z-0:13864] *** MPI_ERR_TRUNCATE: message truncated
[burl-ct-v40z-0:13864] *** MPI_ERRORS_ARE_FATAL (goodbye)
 burl-ct-v40z-0 69 =>
@ompiteam ompiteam added this to the Open MPI 1.6.6 milestone Oct 1, 2014
@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Imported from trac issue 628. Created by rolfv on 2006-11-22T11:16:34, last modified: 2010-02-19T09:49:18

  • rolfv attached ext32.c on 2006-11-22 11:17:17

@ompiteam ompiteam added the bug label Oct 1, 2014
@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by rolfv on 2008-06-25 09:01:18:

I just retested this with the latest trunk and this still fails for us.

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by rolfv on 2008-09-09 14:39:08:

I just retested again with the trunk and with 1.3 and it still fails. It passes for 32-bits. Here are the 1.3 results (r19400).

 burl-ct-v440-0 47 =>mpirun -V
mpirun (Open MPI) 1.3r19400-ct8.0-b31c-r29

Report bugs to http://www.open-mpi.org/community/help/
 burl-ct-v440-0 48 =>
 burl-ct-v440-0 43 =>mpicc ext32.c -o ext32
"ext32.c", line 105: warning: shift count negative or too big: << 32
 burl-ct-v440-0 44 =>mpirun -np 2 ext32
First test passed
Second test passed
Third test passed
ext32: PASSED
 burl-ct-v440-0 46 =>mpirun -np 2 ext32_64
First test passed
Second test failed. Got size of 80, expected 40
Third test failed. Got size of 6400, expected 3200
[burl-ct-v440-0:04800] *** An error occurred in MPI_Pack_external
[burl-ct-v440-0:04800] *** on communicator MPI_COMM_WORLD
[burl-ct-v440-0:04800] *** MPI_ERR_TRUNCATE: message truncated
[burl-ct-v440-0:04800] *** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 4800 on
node burl-ct-v440-0 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
 burl-ct-v440-0 47 =>

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by bosilca on 2008-09-17 10:05:17:

I did find some time to look at this one. And it's really funny ... I guess that your build of Open MPI do not support HETEROGENEOUS systems. Unfortunately, there are a lot of shortcuts when we know we are in a homogeneous environment, and one of those completely ignore the extern32 type (on 64 bits machines as they are different) when we create communicators ...

I'll try to figure out a nice solution, as even in heterogeneous systems the pack_extern32 should work as expected.

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jsquyres on 2010-01-25 20:08:18:

Sun -- is this still happening (i.e., even after the changes in the DDT engine within the last few months)?

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by rolfv on 2010-01-26 16:59:55:

Yes, this still happens. I tried it on Solaris SPARC and a Linux machine using the Sun Studio compilers in both cases. Here is the Solaris SPARC failure.

 burl-ct-280r-0 55 =>mpicc -m64 ext32.c -o ext32_64
 burl-ct-280r-0 56 =>mpirun -V
mpirun (Open MPI) 1.4r22359-ct8.3-b08a-r13

Report bugs to http://www.open-mpi.org/community/help/
 burl-ct-280r-0 57 =>mpirun -np 2 ext32_64
First test passed
Second test failed. Got size of 80, expected 40
Third test failed. Got size of 6400, expected 3200
[burl-ct-280r-0:3402] *** An error occurred in MPI_Pack_external
[burl-ct-280r-0:3402] *** on communicator MPI_COMM_WORLD
[burl-ct-280r-0:3402] *** MPI_ERR_TRUNCATE: message truncated
[burl-ct-280r-0:3402] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 3402 on
node burl-ct-280r-0 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
 burl-ct-280r-0 58 =>

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jsquyres on 2010-01-26 17:18:07:

George -- any hope of this getting fixed?

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by bosilca on 2010-02-08 15:43:17:

The root of the problem here is that if heterogeneous support is not enabled at build time, Open MPI does not support anything else than homogeneous environments. In this particular instance (i.e. 64 bits build and external representation), we are supposed to do a conversion from the local type into the external one. As heterogeneous support is disabled, we report the same size ...

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by jsquyres on 2010-02-08 15:59:54:

Ah, fair enough.

Should we just raise an MPI exception in this case (MPI_ERR_UNSUPPORTED or somesuch)?

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by rolfv on 2010-02-08 16:16:38:

I am not sure I follow all this. We configure our build with heterogeneous enabled.

 burl-ct-280r-0 58 =>ompi_info -all | grep Hetero
   Heterogeneous support: yes
 burl-ct-280r-0 60 =>mpirun -V
mpirun (Open MPI) 1.4r22359-ct8.3-b08a-r13

Report bugs to http://www.open-mpi.org/community/help/
 burl-ct-280r-0 61 =>

I also tried with the mca flag to force heterogeneous.

 burl-ct-280r-0 61 =>mpicc -m64 ext32.c -o ext32
 burl-ct-280r-0 62 =>mpirun -mca orte_hetero_apps 1 -np 1 ext32
First test passed
Second test failed. Got size of 80, expected 40
Third test failed. Got size of 6400, expected 3200
[burl-ct-280r-0:2958] *** An error occurred in MPI_Pack_external
[burl-ct-280r-0:2958] *** on communicator MPI_COMM_WORLD
[burl-ct-280r-0:2958] *** MPI_ERR_TRUNCATE: message truncated
[burl-ct-280r-0:2958] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 2958 on
node burl-ct-280r-0 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
 burl-ct-280r-0 63 =>

@ompiteam
Copy link
Contributor Author

ompiteam commented Oct 1, 2014

Trac comment by rusraink on 2010-02-19 09:49:18:

After inspecting and going through opal_convertor, this will most likely not make it to v1.5.

rhc54 pushed a commit that referenced this issue Oct 12, 2014
OSHMEM: fix memheap/sshmem/mmap to use MAP_PRIVATE instead of SHARED to speedup registration
RM-approved
yosefe pushed a commit to yosefe/ompi that referenced this issue Mar 5, 2015
anandhis pushed a commit to anandhis/ompi that referenced this issue Jul 15, 2016
Addressed pull-request comments from jfsquyres->
artpol84 referenced this issue in artpol84/ompi Nov 20, 2018
@bosilca
Copy link
Member

bosilca commented May 8, 2019

few years old and no replicator.

@bosilca bosilca closed this as completed May 8, 2019
hppritcha referenced this issue in hppritcha/ompi Sep 11, 2019
add definition of MPI_MAX_PSET_NAME_LEN
abouteiller added a commit to abouteiller/ompi-aurelien that referenced this issue May 11, 2020
F08 and PMPI for the ftmpi bindings

Approved-by: Aurelien Bouteiller <[email protected]>
abouteiller added a commit to abouteiller/ompi-aurelien that referenced this issue May 15, 2020
The historical repository with full history and attribution is available
at https://bitbucket.org/icldistcomp/ulfm2/src/ulfm/.

Squashed commit of the following:

commit 73b6fa48c8af40bfa28e24f6c79176a254c449be
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 14 19:21:20 2020 -0400

    Typo in comment for non-blocking error check

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 3a9fd329e35564af826c81aae18d4df4eebbd275
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 14 19:19:08 2020 -0400

    Do not iface_check in non-blocking and never set MPI_ERROR in single
    status functions

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit a9913a4777e0d7d78ff9ead0a51e807316f01d2f
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 14 18:12:08 2020 -0400

    Remove iface_create_check on intercomm creations

commit 99ea1398127c51ada0179ab1737f2134ee0de8ff
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 14 17:43:41 2020 -0400

    Update README to denote supported/unsupported components and default
    settings

commit 59110aa35fa465cddf65e2937066928e45a685c0
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 14 13:22:14 2020 -0400

    Do not disable compile-time components
    with_ft is on by default at configure time
    enable_ft is off by default at runtime
    have a --tune file to control the behavior of loaded components
      disable runtime loading of MTL and PML components and hcoll when FT is
      on.

commit 66566b63f1dd9eae633d57c1f3cca57c78978a22
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed May 13 04:34:50 2020 -0400

    Correct error path in comm_spawn

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 9a5cb3cb79ab4321a14425a422f68d336b4681ab
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed May 13 02:39:43 2020 -0400

    Remove extra ompi_request_t fields (tag, peer, any_src_pending)

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit b1dda7c8d51c66f10dadcea676d7e5622b549a18
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 12 14:32:39 2020 -0400

    Cleanup ftagree (FAILURE_PROB)

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit c05bf3ef14ac8d5b55f936bd2ff7680575a1d019
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 12 12:27:30 2020 -0400

    Remove the need to modify every coll component to add agree
    Rename coll_agreement to coll_agree (to match existing practice of
    matching the MPI name)

    Signed-off-by: Aurelien Bouteiller <[email protected]>

    Copyright cleanup in unchanged files

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit cf0461886a9318ac0b87c73f2c2a1868b9481be6
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 12 02:30:27 2020 -0400

    Copyright cleanup

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 61eb3b3163011769a020d2a714085380e8b6d8b3
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 12 01:46:39 2020 -0400

    Round 1 of review comments

commit 64d956017415bf40397a12f039e62211e57c5c56
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon May 11 00:34:23 2020 -0400

    Revert changes to version and README for standalone ULFM packaging.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit cd5c5ed41b3dcd4162632c47ac500daf4cc5216f
Author: Aurelien Bouteiller <[email protected]>
Date:   Sun May 10 23:58:33 2020 -0400

    Revert "Restore ulfm specific changes to openib btl cancelled by merge 4ce1669a"

    This reverts commit f2b7da5d488f1b1d27c6a8643128a10eadd86f67.

    Revert "Revert "platform: Remove "with_verbs" from all the platform files.""

    This reverts commit 74d9c41e32e5b0c7fdb720156091a1eb49c03537.

    Revert "Revert "README: Remove all references to --with-verbs[*]""

    This reverts commit 385dbd0dad512245e9197af98244ac970f3d956e.

    Revert "Revert "opal/common: remove stale common components""

    This reverts commit 0c3a306c695eb12d489b9fdbfa4ec6262935e7c1.

    Revert "Revert "m4: remove all configury related to libibverbs""

    This reverts commit f8f1b8537fd929a4fc1432936a71d7f2def41bbd.

    Revert "Revert "btl/openib: So long / farewell / it's time to say goodnight""

    This reverts commit 4a82cca865ac043e8aab75356ed78786115b52ef.

commit f627b1c53de171dd6551e8b00fb5907715364939
Merge: fb3507a1 9996b9f5
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 14 21:20:59 2020 -0400

    Merge branch 'master' into ulfm-prrte

commit fb3507a19183fe4293dad1d0d432641a11640a89
Merge: 0823ee3e 0dc23252
Author: Aurelien Bouteiller <[email protected]>
Date:   Sun May 10 23:07:42 2020 -0400

    Merge branch 'master' into ulfm (orte removal)

commit 0823ee3e57d24d11ee1c8ba232c601707645a7a8
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Jan 31 16:26:00 2020 -0500

    An error in readme about Agree: it does a AND

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 322684e42d99e28964678c9f54a0de570dd47f39
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Jan 31 15:02:23 2020 -0500

    Change verbosity in agree to help track split-decision bugs

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 67ca89e04d4b5452fc5871d823e00ae5f6e247bb
Merge: d4ff45bd cf4398e2
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Jan 31 18:54:20 2020 +0000

    Merged in abouteiller/ulfm2/bugfix/era_thread_safe2 (pull request #21)

    Thread safe access to era_incomplete_msg and passed_agreement hash-tables

    Approved-by: Aurelien Bouteiller <[email protected]>
    Approved-by: George Bosilca <[email protected]>

commit cf4398e2a0431386b2216ae73e4251c0978143bc
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 30 15:38:08 2020 -0500

    Thread safe access to era_incomplete_msg and passed_agreement
    hash-tables

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit d4ff45bdf4aad071d3f1abddda9ac3576a83741e
Merge: cdd2f6b4 12757660
Author: George Bosilca <[email protected]>
Date:   Thu Jan 30 18:47:34 2020 -0500

    Merge remote-tracking branch 'upstream/master' into ulfm

    Signed-off-by: George Bosilca <[email protected]>

    Conflicts:
    	ompi/include/mpif-values.pl
    	ompi/mca/coll/libnbc/nbc.c
    	ompi/mca/pml/ob1/pml_ob1.c
    	ompi/tools/ompi_info/param.c
    	opal/mca/btl/tcp/btl_tcp_endpoint.c
    	opal/mca/btl/tcp/btl_tcp_frag.c
    	opal/mca/hwloc/hwloc2/configure.m4
    	orte/mca/odls/base/odls_base_default_fns.c

commit cdd2f6b43961857cf4c84c27de608c7462e37919
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Jan 30 14:14:45 2020 -0500

    Update VERSION to the new numbering scheme :v4.1.0u1a1: alpha 1 of the
    first release of ULFM based on (unreleased, devel) v4.1.0

    Signed-off-by: Aurélien Bouteiller <[email protected]>

commit b8da0edf73b446cc2aa59f0f86b48c925d3add37
Merge: e5c6c5e6 c6ade8fa
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 30 04:32:15 2020 +0000

    Merged in abouteiller/ulfm2/bugfix/concurrent-tcp-close (pull request #16)

    Do not close the socket meanwhile the opal_progress loop is adding events to the event base

commit e5c6c5e6f240260514e08e130177e7f86f2246ee
Merge: c2212cb0 227a6779
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 30 04:31:29 2020 +0000

    Merged in abouteiller/ulfm2/bugfix/openib-noproc-error (pull request #20)

    An error without an errproc is always promoted to fatal, which causes pandemic failures when openIB credits to a dead peer exhaust.

    Approved-by: George Bosilca <[email protected]>

commit c2212cb0fd4ed8a54b36f02f9cb234cd1df2ac69
Merge: 43c1d324 2510df24
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 30 04:30:42 2020 +0000

    Merged in abouteiller/ulfm2/bugfix/recursive-era-mark-failed (pull request #19)

    Resolve recursive and multithreaded access to the era

    Approved-by: George Bosilca <[email protected]>

commit 2510df24a73ba5a563537e0c44b6249f163679cd
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Jan 22 15:33:48 2020 -0500

    Resolve recursive and multithreaded access to the era_parent and
    next_child functions causing inconsistent agreements

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 227a67797859e176336a7033b1bf9cb0f94584c7
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon May 20 12:08:18 2019 -0400

    An error without an errproc is always promoted to fatal, which causes
    pandemic failures when openIB credits to a dead peer exhaust.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 43c1d32448e64ff2bd322b206d82b27e75033fd8
Merge: cf8dc43f a36f138a
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Jan 22 20:55:33 2020 +0000

    Merged in bugfix/sync-mt-waitall-any-some (pull request #18)

    bug fix SYNC_WAIT with threads in WAITALL and friends

commit cf8dc43f907353b40b42aaf7318e05b49e7243a5
Author: Aurelien Bouteiller <[email protected]>
Date:   Sun Nov 17 12:00:54 2019 -0500

    Close the detector before removing the bsend system, but after deleting Self attr

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 6e386e4d66288f68e8c12a81ede76b7cceb86471
Author: Aurelien Bouteiller <[email protected]>
Date:   Sat Nov 16 22:13:05 2019 -0500

    Cleanup asserts and add some more debug messages

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 593db6aca8dd89997eb6787cad409778e11ef0b8
Author: Aurelien Bouteiller <[email protected]>
Date:   Sat Nov 16 22:08:50 2019 -0500

    Return a revoke error only when comm is revoked

commit a36f138a911a457fae57366bbbb501eb1efe77ee
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Nov 13 17:36:30 2019 -0500

    Fix a case were the SYNC_WAIT would be rearmed while it was unsafe
    w.r.t. a progress thread, and cases were the SYNC would be released
    before being SIGNALED.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 791214b118570df301c6cbe47ad291a54bc21ab8
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Nov 13 17:25:14 2019 -0500

    Be more verbose about having a progress thread in the detector.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 0e249ca1ae5cb27a3f3d907173b65db188380ce5
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Nov 12 17:28:57 2019 -0500

    Remove the pending event when socket is TCP_FAILED

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit f363e250686cc299631fa26a2cc92e3f2dc9e5d6
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Nov 12 17:20:52 2019 -0500

    Fix a set of issues with  Agree

commit c7473b5d227a74f28a7fa4a6019f498e06d20b34
Merge: 897b87a0 88c18329
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Nov 11 19:54:55 2019 +0000

    Merged in abouteiller/ulfm2/sanity/dont-mark-myself-failed (pull request #15)

    Do not mark myself as failed, this is never normal

    Approved-by: Aurelien Bouteiller <[email protected]>

commit 88c18329e525ad7cf5648c10e20a59add0073c11
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 8 16:16:30 2019 -0500

    Do not mark myself as failed, this is never normal

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit c6ade8fa34d8545f17afde35eda67ab4ceedc3f2
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Nov 6 14:01:01 2019 -0500

    Do not close the socket meanwhile the opal_progress loop is adding
    events to the event base

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 897b87a0d680c3604756309ef78c368675eb884c
Merge: 94391d9e 82c9b479
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Nov 11 19:20:07 2019 +0000

    Merged in abouteiller/ulfm2/bugfix/mt-sync-revoked (pull request #17)

    Bugfix/mt sync revoked

    Approved-by: George Bosilca <[email protected]>

commit 82c9b479ed4656696e3a1217405847c68ddc2575
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 8 18:03:26 2019 -0500

    Do not add more requests to the matching queue after the comm is revoked

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 4b01a5764869dff4f922903283053784f5a42301
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 8 17:56:37 2019 -0500

    Bugfix: we need to check if the request if ok before entering the first
    waitsync_mt

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 94391d9e38ad53ce55bc2764ed910b329ef4b92f
Merge: eb275c65 f7b5b637
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Nov 5 14:05:08 2019 +0000

    Merged in abouteiller/ulfm2/bugfix/fd-drift (pull request #14)

    reduce the sensitivity fo the detector to noise and drift

    Approved-by: George Bosilca <[email protected]>

commit f7b5b63763b974cf06372645cff3e044a4a53165
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Nov 4 10:58:06 2019 -0500

    reduce the sensitivity fo the detector to noise and drift

    Signed-off-by: Aurélien Bouteiller <[email protected]>

commit eb275c655dee7ee7d18fe24004a3d37bfd25a8c2
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri Oct 18 17:02:09 2019 -0400

    Document why an assert may trigger in false-detection scenarios

commit 52c2a5d710f80c0d26bf1cd7c42f7cbd58cc1e24
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 16 15:06:02 2019 -0400

    Use the correct option to force internal pmix/event

commit bc69fd1bd1acf3a778b11c13f667ca3b972f1610
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Oct 15 14:44:20 2019 -0400

    We have modifications in pmix and libevent, prefer the internal ones

commit 617e2b4c9ce27c24d3c8eb6c8aa539884904a65c
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Oct 15 14:43:16 2019 -0400

    Bugfix a case where the FD would keep observing a dead process forever
    if reported from inline (rather than by the detector itself)

commit b54585d832588258277ea4d16d519c6a46439260
Author: Nuria Losada <[email protected]>
Date:   Tue Aug 6 10:18:52 2019 -0400

    Avoid cleanup of job_session_dir and orte proc_session_dir upon application process failure

commit f8d536027988500abb87adc22fa147be6d3eda7e
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jul 11 14:44:05 2019 -0400

    Cleanup rdma_frags and registrations in revoked/error sendreqs

    Free up rdma_frag in sendreqs when the request is cancelled in error or
    revoked.

    Return registrations for cancelled/revoked sendreqs

    Remove dead/useless code

commit 6c76e287178d42d7dfd1e50e6be4ba18a86a06a1
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jun 13 04:19:44 2019 -0400

    Missing semicolon appears only when fotran logical needs conversion

commit 92e108f9ae1e4ffb129086ada8d4a7643ee8c708
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jun 13 03:29:28 2019 -0400

    A bug in PMIx disables node-local detection, use the OMPI detector
    instead

commit 4dcf700e1a49479d1df4693b32cdc5cd187ec056
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri May 24 14:28:15 2019 -0400

    Do not send rbcast to known dead processes to avoid paying the
    send-detection penalty

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 6f375bff8e2d893343064e51bc01b6806d166d1c
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed May 22 13:59:22 2019 -0400

    When receiving a wrong heartbeat, ignore it rather than rearming

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 027afa741bf99481e7b1c2ad66579fd611190489
Merge: 08122763 b7806672
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 21 17:32:53 2019 -0400

    Merge branch 'master' into ulfm

commit 081227637a652b7b82103697c0b7c353ad58e220
Merge: 6f002936 aa5e5a65
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Apr 23 20:57:35 2019 +0000

    Merged in abouteiller/ulfm2/merge/postopenib (pull request #12)

    Merge/postopenib

    Approved-by: Aurelien Bouteiller <[email protected]>

commit aa5e5a65e4e02931b6239749a1d1671bd407f655
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Apr 3 17:33:32 2019 -0400

    Let errors flow through spawn/connect accept in order to make sure we do
    not end-up in unmatched mpi calls in error cases

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit edf0086d55ad955b26336bd96d131482dbb88ef4
Merge: 0fe172d9 97b7fab8
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Mar 25 11:14:23 2019 -0400

    Merge branch 'master' into merge/postopenib

commit 0fe172d9bf5cf7e9f82c951004ce32ffd8cc2955
Merge: f2b7da5d 53cd31ed
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Mar 22 00:46:18 2019 -0400

    Merge branch 'master' into merge/postopenib

commit f2b7da5d488f1b1d27c6a8643128a10eadd86f67
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 15:08:06 2019 -0400

    Restore ulfm specific changes to openib btl cancelled by merge 4ce1669a

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 74d9c41e32e5b0c7fdb720156091a1eb49c03537
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 14:57:55 2019 -0400

    Revert "platform: Remove "with_verbs" from all the platform files."

    This reverts commit 99553eb1b9b2a6300525e06114b38c1c091f23e8.

commit 385dbd0dad512245e9197af98244ac970f3d956e
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 14:57:47 2019 -0400

    Revert "README: Remove all references to --with-verbs[*]"

    This reverts commit 48a33ee6db06df1426d3ab9fa4adb2c6d182f8d3.

commit 0c3a306c695eb12d489b9fdbfa4ec6262935e7c1
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 15:25:21 2019 -0400

    Revert "opal/common: remove stale common components"

    This reverts commit 3f4af8e51ca70f7ca0e46b734f3e11e513b858dc.

commit f8f1b8537fd929a4fc1432936a71d7f2def41bbd
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 14:56:52 2019 -0400

    Revert "m4: remove all configury related to libibverbs"

    This reverts commit 59c8ab6da4276ff398453a54910c6c0fb67a153c.

commit 4a82cca865ac043e8aab75356ed78786115b52ef
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 14:56:10 2019 -0400

    Revert "btl/openib: So long / farewell / it's time to say goodnight"

    This reverts commit 8de786f5a40ab96069b9c661d6ea8bb892688cac.

commit 4ce1669a7463280528473eeb69e59dc360f75a31
Merge: 6f002936 01737960
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Mar 13 14:54:24 2019 -0400

    Merge branch 'master' into merge/postopenib

commit 6f002936fc1d08dc3d82190c6997a910b655b59d
Author: Aurélien Bouteiller <[email protected]>
Date:   Sat Mar 9 10:02:59 2019 -0500

    Suppress the not useful gotos for error cases that cannot happen
    issue #40

    Signed-off-by: Aurélien Bouteiller <[email protected]>

commit 67ae93928ebac0eafd0948cdd5602854fa2d6f07
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Mar 7 14:36:28 2019 -0500

    Resolve deadlock in MT wait-sync rearming post-error

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 804bb69340ca1500828a78f91917a2ea155f256e
Author: Thananon Patinyasakdikul <[email protected]>
Date:   Tue Jan 29 13:34:44 2019 -0500

    opal/threads: reverted #6199

    This commit reverted pr #6199 as it introduced deadlock in some cases.
    Also removed the assert as the condition is obsoleted.

    Signed-off-by: Thananon Patinyasakdikul <[email protected]>

commit b7f8c6ffc361d7753abc9b76093582f6f98b52e3
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 6 14:33:10 2019 -0500

    Rename ftbasic to ftagree

    Signed-off-by: Aurélien Bouteiller <[email protected]>

commit 8b057449f1950e3ff79fd8592a82db78e533948b
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Feb 22 19:10:35 2019 -0500

    Simplify generation of PMPI_xxx_f

    Fixup ompix_xxx in fortran pmpi interface

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 6979f860d08c27aa6dc6a7c6f1ade171bc0c01bf
Author: George Bosilca <[email protected]>
Date:   Thu Feb 21 22:03:22 2019 -0500

    Fix the warnings in the Fortran API.

    Signed-off-by: George Bosilca <[email protected]>

commit 11deb93207d786488789811f6641cb68003a9e40
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Feb 21 19:56:33 2019 -0500

    Erroneous modification in typedef for rdma heartbeats

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit c0f544b690e850ff8ec164ee90ab0dd006f0e941
Author: George Bosilca <[email protected]>
Date:   Thu Feb 21 19:56:09 2019 -0500

    Prevent EPIPE on OSX.

    Signed-off-by: George Bosilca <[email protected]>

commit 96be67d66ff6e7656c879ddf0c2605a86f45cf3c
Author: George Bosilca <[email protected]>
Date:   Thu Feb 21 19:52:52 2019 -0500

    Address a race condition in libevent select.
    This is not really a fix for the race condition because I could not
    figure out how it happen, but it does address the problem generated by
    the race. If we do not remove a bad fd from the select list we keep
    getting the same error from select, and we stop doing any progress on
    the communication side. Thus, we forcefully disable all bad fd as soon
    as select fails, and we are back in track, progress ensure and
    everything seems to work as expected (no leftover events in the event
    base).

    Signed-off-by: George Bosilca <[email protected]>

commit eab20ba06442936293d21cae78e03c7c68f500b3
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Feb 21 19:33:54 2019 -0500

    resolve pedantic warnings in PMPI fortran ulfm bindings

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit eb55ffb189cbb77a52f38943ab44427752f4af39
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Feb 21 19:00:32 2019 -0500

    Remove pedantic warnings in ERA agreement

commit eb85245b30f5cb885a87da40b8f671d56cc6236b
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Feb 21 17:58:36 2019 -0500

    OPAL_ENABLE_MULTI_THREADS does not exist anymore
    also fix a number of warning in enable-picky in detector/propagators

commit 04b0a92b540b2163b37f840bc3f35b2992567de4
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Jan 4 15:44:40 2019 -0500

    The order of the attribute creation is important

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit c87d9483ad9799b6d3b7a6d48770ee2fd74b7855
Merge: edf88350 8a18a831
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Jan 4 13:35:19 2019 -0500

    Merge remote-tracking branch 'ulfm2/ulfm' into ulfm

commit 8a18a831dab6161e19b64f17c7640b8eb3a03188
Merge: d19c4a82 8ad77b66
Author: Nathan Weeks <[email protected]>
Date:   Fri Jan 4 18:23:06 2019 +0000

    Merged in nathanweeks/ulfm2/issue/use-mpi (pull request #11)

    Fix INTENT of flag argument to MPIX_Comm_[i]agree

    Approved-by: Aurelien Bouteiller <[email protected]>

commit 8ad77b66a9d45dc8c73c25e0a321725d8e8b0689
Author: Nathan Weeks <[email protected]>
Date:   Fri Jan 4 10:18:36 2019 -0600

    Fix INTENT of flag argument to MPIX_Comm_[i]agree

    Signed-off-by: Nathan Weeks <[email protected]>

commit edf88350a8b46fe92cf40a72266685ecbbeccad3
Merge: d19c4a82 0dc0d77b
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 3 13:51:47 2019 -0500

    Merge branch 'master' into ulfm

commit d19c4a82df7d79285aa5d39cbb2ea1507898f65f
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 3 12:17:59 2019 -0500

    Handle the case where the bridge comm is revoked in get_rprocs

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 383b889df896e5059c2542b439bfb7f6846c4422
Merge: 2c536936 ce61988c
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 21:40:37 2018 +0000

    Merged in abouteiller/ulfm2/feature/isrevoked (pull request #9)

    Adding 'is_revoked' functions for communicators

commit ce61988ca8ed085ae999fa6866b5459d8952c756
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 16:34:05 2018 -0500

    Correct F08 and other bindings for is_revoked

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 6c7f413ad17c3232c811b14ffa00ddeb3d2dd1c4
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Mar 26 12:28:30 2018 -0400

    Adding 'is_revoked' functions for communicators

commit 2c536936a337d2e7508213a95724bf8f9c9c6239
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 15:26:44 2018 -0500

    Rename README to README.ompi

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 9f2d068ee078fa2aaba725010d0cb70b4c5ddb3c
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 15:24:32 2018 -0500

    More README renaming for Bitbucket

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 9861c014cb5f19b356a67982c22295fd1da7fc8d
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 15:14:01 2018 -0500

    Move the Open MPI README so the ULFM readme gets rendered from the
    bitbucket page

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit e8127fc61c0ed677c1061e3e788623e61299992c
Merge: ec5675fc cc16badc
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 19:57:21 2018 +0000

    Merged in abouteiller/ulfm2/topic/usepmpi (pull request #10)

    F08 and PMPI for the ftmpi bindings

    Approved-by: Aurelien Bouteiller <[email protected]>

commit cc16badc25a81f05c7e9c0dd646d5b1dd1599d8c
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 01:47:40 2018 -0500

    Add PMPI  F08 ftmpi bindings

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit cd2850fdadb1a0c36dc370f7991ea8f86e1c626a
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 01:15:13 2018 -0500

    Correct fortran ftmpi bindings w/o weak symbols

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 866d91f2b7cf9a58c2740dcfb3d884451756965d
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Dec 21 00:19:37 2018 -0500

    Upgrade mpiext ftmpi to the new PMPI generation system:

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit ec5675fc533cc921a4565e8bde28238dcbfdc6ce
Merge: dbcfc7a9 14eec9a3
Author: Nathan T. Weeks <[email protected]>
Date:   Fri Dec 21 07:10:49 2018 +0000

    Merged in nathanweeks/ulfm2/feature/mpi_f08 (pull request #6)

    Add mpi_f08 bindings for ULFM routines

    Approved-by: George Bosilca <[email protected]>

commit dbcfc7a986eba5dbc6ce7c590b232697739567b2
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Dec 18 13:54:08 2018 -0500

    Upgrade the ftmpi extension to the new naming scheme; restore pcollreq
    since it does not cause problem anymore

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 5170d9cb7f12ca882790c22544ef18448ceb3860
Merge: f00c5732 6f5f3110
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Dec 18 11:26:36 2018 -0500

    Merge branch 'master' into ulfm

commit f00c5732902e2d8cbd033083248b1b9cca992d5b
Author: Aurelien Bouteiller <[email protected]>
Date:   Sat Nov 3 11:29:03 2018 -0400

    Disable pcoll for the time being it breaks the fortran bindings

commit e24ddc24977e91a44fbcf352dd3156cc7eb35e0c
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 2 00:44:47 2018 -0400

    update version string and changelog

commit 6304043d40daf6759960814975e0f964f3c117bb
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 2 00:43:27 2018 -0400

    Set sane default components

commit bbb19203bda985f96ec608b9e24178e74926b540
Merge: 77f9157e 37954b5f
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Nov 1 15:18:45 2018 -0400

    Merge branch 'master' into ulfm

commit 77f9157ea7dcb5c2b517455c9e249b6b8068fa5d
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 31 12:51:11 2018 -0400

    Resolve a recursive destruct on the iof proct in finalize

    Signed-off-by: Aurélien Bouteiller <[email protected]>

commit 3ef11c7d09adaa47d76db72dc58a661b89e571fd
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Oct 24 02:03:24 2018 -0400

    Prevent errmgr invokation from crashing in finalize

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 86985a5b61e2ccc60bbe938e81d947684d12c8f2
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Jan 26 15:23:19 2018 -0500

    Re-add the Handle error cases in TCP BTL rejected in upstream

    When an error is returned by the socket operations, trigger the
    appropriate error path in the PML to give an opportunity for
    rerouting/error handling.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 33b8fce232b233a3b0ed519802eb15eb7e5995ab
Merge: 6566fc4c a1e85b03
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 30 17:04:11 2018 -0400

    Merge branch 'master' into ulfm

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 6566fc4c68ff0d89d68abdfd8382b411104b47d6
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Oct 23 22:42:35 2018 -0400

    Correctly propagate the oversubscribe flag to the spawnees

    Signed-off-by: Aurélien Bouteiller <[email protected]>

commit 07df428c2f82718133d707c5f017f417c07e3bd8
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 22 15:38:31 2018 -0400

    The error field of requests needs to be rearmed at start, not at create

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 359f044b4d2cac87fcbb55411c642bb108dcf720
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 22 11:25:01 2018 -0400

    Correctly bubble up errors in NBC collective operations

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 9579efaeca2ccdfb553cbf122755571e8af970fe
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 22 11:17:00 2018 -0400

    Bugfix a debug statement calling pml dump

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 428f3506927497ed09f7ad1d97c0e5fbfb4adf67
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Oct 18 10:56:44 2018 -0400

    Disable inband PML error reporting during MPI Finalize as it interferes
    with the Finalize process. A better fix is being worked on upstream, but
    lets have it work in the meantime.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit ce72ffb4a76e6d33f4e12f8aa4cba93115009c2f
Merge: d9284a60 69f9da91
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Oct 4 12:38:26 2018 -0400

    Merge branch 'master' into ulfm

commit d9284a6005c2e2c615d19903a6d819f126d735c7
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Sep 26 10:52:29 2018 -0400

    A pmix_3x constant was still present.

commit bc26604d3ed16b73ff8f1f756adf965d194272fe
Merge: 908eead4 3f598e9e
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Sep 24 17:40:15 2018 -0400

    Merge branch 'master' into ulfm

commit 908eead4aedf95a5e565bf4f9af5ac2ccd2494f9
Merge: 70ee1f45 1ca6f38e
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Aug 7 13:30:54 2018 -0400

    Merge remote-tracking branch 'ulfm2/ulfm' into ulfm

commit 70ee1f452b40f0ac7e2b319cfc478859a3fffe21
Merge: e87f595e ae030146
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 6 14:01:18 2018 -0400

    Merge branch 'master' into ulfm
    Heavy modifications in nbc error management and coll tags

commit 1ca6f38ea8a3d0d26efd4a7e755c7edc17bc8e47
Merge: e87f595e 4d129617
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 1 14:08:46 2018 +0000

    Merged in abouteiller/ulfm2/feature/pubsub (pull request #5)

    Do not disable publish/subscribe for no good reason: these are local operations.

    Approved-by: George Bosilca <[email protected]>

commit 14eec9a3d164cc68d92844fc219f0664aa36fd90
Author: Nathan T. Weeks <[email protected]>
Date:   Tue Feb 27 18:56:56 2018 -0800

    Add mpi_f08 bindings for ULFM routines

    Signed-off-by: Nathan T. Weeks <[email protected]>

commit e87f595e6bf1ab2366c10f05d3aac0217079d68c
Merge: 63e0514d df0ccbee
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Mar 1 11:07:05 2018 +0000

    Merged in abouteiller/ulfm2 (pull request #8)

    Ulfm

commit df0ccbeee3727663a9ddb1a39ca670343f004bb9
Merge: 63e0514d 9944d63d
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Mar 1 05:38:53 2018 -0500

    Merge branch 'master' into ulfm

commit 4d12961757171b1aa28b67efc9a40d24266d9998
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Feb 21 19:02:42 2018 -0500

    Do not disable publish/subscribe for no good reason: these are local operations.

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 63e0514db046de8665f2f3510fab7e739a93a7c2
Author: George Bosilca <[email protected]>
Date:   Fri Feb 16 01:55:29 2018 -0500

    Fix usage of OPAL_ENABLE_FT_MPI.

    Signed-off-by: George Bosilca <[email protected]>

commit cec02d4408489cc24ae5d4dd69476d6e33c5fab9
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Feb 14 16:18:57 2018 -0500

    bugfix: missing declarations for *ft_register_params

commit 6006795e842354b2bbf9308ee119e2dcaf1848a7
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Feb 14 16:18:16 2018 -0500

    NBC_Error does not have an int as first param

commit ac6bb3ea190e3f441d025d398a711dbd22e2a4b3
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Feb 13 17:45:58 2018 -0500

    Further tuning of the timeout default value for the thread detector

commit 577c61693c4d10dded6c5d4e4f909caf9794bad3
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Feb 12 14:53:15 2018 -0500

    Wrong number of params to NCB_DEBUG

commit e6cf7dc044a9f84aaab4c41ebfab27029f12972e
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Feb 12 14:52:59 2018 -0500

    wrong encoding

commit 228c12add80446de2220f8f9761ff260a3cd2034
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Feb 12 13:11:16 2018 -0500

    Expose the FT and detector controls to the enduser in ompi_info

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 713c94e85a141772fad8a4cb2842e643b9f22716
Author: George Bosilca <[email protected]>
Date:   Sun Feb 11 22:23:38 2018 -0500

    Fix ULFM profiling.

    Signed-off-by: George Bosilca <[email protected]>

commit 7a42d912261b62082b9e8d8e6586ba4f3dac8ee9
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Feb 1 01:43:02 2018 -0500

    Erroneous merge in comm_cid: uninitialized epoch

commit 8e940d2938e4dc236bd4acfae4e3678de9a71810
Author: George Bosilca <[email protected]>
Date:   Mon Jan 29 13:48:13 2018 -0500

    Minor fixes to make clang happy.

    Signed-off-by: George Bosilca <[email protected]>

commit 11e6355b5a4aeacdb19d9b3dd6c4bd7863834cb2
Merge: 17d0158a 5b0df815
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jan 25 11:03:42 2018 -0500

    Merge branch 'master' into ulfm

commit 17d0158a45fb08fcad202a9352729fae829f68d1
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Jan 17 16:35:28 2018 -0500

    bugfix: any-source request completed meanwhile it was reported PROC_FAILED_PENDING needs to see its status rechecked

commit 51bbd220c75ca59f230e9729836dcc33a20313a6
Merge: 199f5f0d f3a096dd
Author: Nathan T. Weeks <[email protected]>
Date:   Wed Dec 20 00:31:59 2017 +0000

    Merged in nathanweeks/ulfm2/issue/comm_failure_get_acked-f90 (pull request #3)

    Correct type of MPI_Comm_failure_get_acked failedgrp argument in Fortran USE mpi interface

    Approved-by: George Bosilca <[email protected]>

commit f3a096dda733cbdd3f91524fd9973af5ba41e7d1
Author: Nathan T. Weeks <[email protected]>
Date:   Tue Dec 12 19:18:41 2017 -0800

    Correct type of MPI_Comm_failure_get_acked failedgrp argument in Fortran USE mpi interface

commit 199f5f0d2d6139460d0461cbf4b374d117dac4f6
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Nov 20 15:19:48 2017 -0500

    Make sure we mark the proc as WAITPID status in signalled and non-zero exit cases

commit e3006cafe4f9e4e55774679199b94b1e3d24ca5d
Author: George Bosilca <[email protected]>
Date:   Fri Nov 3 23:47:16 2017 +0000

    No accents in the names

commit 2e75c73cc620eceb7396e9aac77a13e235c2a77b
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 3 18:59:52 2017 -0400

    Tweak default FD and update readme notes

commit f4bd88c98f1936a609e9145cd506b22a5722fa90
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Nov 3 18:59:22 2017 -0400

    Pass correct arguments to pmix cb when out of memory

commit 87d50db1d34695a97de094977f7fa9163c35b14e
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Nov 2 10:48:10 2017 -0400

    Changing the default IB retry timeouts is not a good idea.
    We'll need to find another way to speedup credit recovery in failure cases.

commit 2fb5440a589baf8666f6cf30992b3a3bd04a6aca
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Nov 1 10:07:39 2017 -0400

    Mark the IB endpoint as failed when invoking an error; this resolves UDCM connection deadlocks

commit 79aca0bb799f90f53c949e161b9f173c1fca2996
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 31 23:20:31 2017 -0400

    Make it compile in non-debug builds

commit 04f61d22769f13adcfec822f83bc5ec079501a62
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 31 22:55:51 2017 -0400

    bugfix: major: openib send credits returned correctly after a fault for pending frags to dead processes; also tweak the default IB retry timeouts tomake this happen faster

commit 942b0ab8bd8fc5f9e0b39312553c3a42228720c4
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 31 22:02:24 2017 -0400

    Bugfix: leaking frags after failure in TCP btl

commit 6db29438a0299f779b59972ae6528a035ff56348
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 30 21:34:04 2017 -0400

    Copyrights since 1624f1f5

commit 5dd7d6fc35e1398e12338ecc49eadf30aa818a8d
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 30 21:21:37 2017 -0400

    bugfix: returning ERR_PROC_FAILED from iSend violates ULFM spec.

commit 9bf3923d51dcf876f1c20a01757cd94dbde9022a
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 30 17:01:33 2017 -0400

    Bugfix to upstream: do not return ERR_IN_STATUS from collectives

commit 954cd2f53e9c2985a21bbb1fc374b83678df8f8c
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 30 16:28:38 2017 -0400

    Bugfix: capture cases where ERR_UNREACH is returned instead of PROC_FAILED when the BTL finds the failure first

commit 61c5954fc1aff273a40c213d38e850862e9bf7e7
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Oct 27 17:30:23 2017 -0400

    Fix error cases in TCP connect_ack

commit 0237a70791b7b9d6f8b657e1a647b3b0dfab935f
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Oct 27 17:27:48 2017 -0400

    Various fixes to orte/pmix so that late notifications do not crash during finalize

commit afe72afab6f873a66c9f257ac8d1e36f32627882
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Oct 27 17:24:29 2017 -0400

    Turn of ftmpi_enabled after the FD is turned off.

commit 9712330b37fb8d5b7f1f77e79efe0a0f6c695ade
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Oct 27 17:07:55 2017 -0400

    Fallback to abort when pml finds an error and ftmpi_enable is false

commit a9ec68580d3fddd436b5df3b31e0621ba5d11f77
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Oct 25 16:46:42 2017 -0400

    Bugfix: interrupt operations on localcomm in failed/revoked intercomms

commit 8bacc1491355d4369251d45fa2e9db0e7647d05e
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Oct 19 12:58:11 2017 -0400

    Adjust init slack

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit 1624f1f521bcd24978370ce614889fb01841ea8c
Merge: 768e6f5c 689f1be9
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Oct 19 12:23:44 2017 -0400

    Merge branch 'master' into ulfm

commit 768e6f5c563bc4575fc3dd50313d0136958dd863
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Oct 19 12:19:28 2017 -0400

    Resolve a case where the detector creates an event with infinite period

commit 252544f8e4493ac5c2478f6d5322757168a67869
Merge: e3fff257 27eb401a
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 16 15:54:06 2017 -0400

    Merge branch 'master' into ulfm

commit e3fff257517996f5758cedb7c6f6082f9e18a6da
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 16 13:37:59 2017 -0400

    Disable XPmem as it doesn't work with recovery

commit d105a9f951a27bde804f3b9398e1e97acf894763
Author: George Bosilca <[email protected]>
Date:   Wed Oct 4 19:41:09 2017 -0400

    Pass OMPI CFLAGS to libevent.

    Signed-off-by: George Bosilca <[email protected]>

commit 250892aaa815e4f5b2e9692dd51f81fc4f47b733
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 3 19:55:13 2017 -0400

    Bugfix: permit detection of multiple failures on the same node

commit 914fcbda90ac1b00d47dea7808e7cdfb48e73bba
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 3 11:16:36 2017 -0400

    File had been added by mistake

commit 9540a2c7ccb901119bebbc0be6edc9b0e6b86c76
Merge: 16221bf5 a3ac67be
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Oct 3 10:16:20 2017 -0400

    Merge branch 'master' into ulfm

commit 16221bf5d7c312532230b2fabb891791327c5118
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Oct 2 13:33:30 2017 -0400

    Bugfix: cleanup half created comms when failures strike in comm_dup and friends

commit d04eb935478fa3afc1975aa7de0119d398e9772d
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Sep 29 00:20:58 2017 -0400

    Silence too verbose messages in libnbc

commit 3ab5df55dbd087423acb7c87ba34ada99a6752b6
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Sep 28 23:40:12 2017 -0400

    Interrupt the getnextcid_nb when a failure disrupts it.

commit 2609388abeaadcaf6095130499c60bfc46ba4a00
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Sep 28 23:39:21 2017 -0400

    Propagate error codes from NBC to upper layers.

commit f679439e032eb3f03dc9afcdd62c2eae686bdb46
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Sep 27 17:52:30 2017 -0400

    Start from known failures rather than acked failures in comm_free agree

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit b63b7c15a1139395ff56f3fb448efea56dc7de91
Author: George Bosilca <[email protected]>
Date:   Wed Sep 27 01:08:05 2017 -0400

    Use the correct header.

    Signed-off-by: George Bosilca <[email protected]>

commit b4535b770b197e6c278340ffbff5891401e294c0
Merge: d888d603 7cb22e1b
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Sep 25 23:18:14 2017 -0400

    Merged perf/shrink_remembers into ulfm

commit 7cb22e1b6bec7b3fd71aeff0bc7d737a5838dabe
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Sep 25 23:07:46 2017 -0400

    Perf: start shrink from known failures

commit fecf5707a2882701f9435b25a487e1cb1aa8be9b
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Sep 25 23:07:02 2017 -0400

    Bugfix: revoke should not revoke NBCs pertaining to shrink

commit d888d6035f5b9e41ef39b76a1709522b3652f890
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Sep 22 17:50:42 2017 -0400

    Perf: decrease fd_finalize duration

    Signed-off-by: Aurelien Bouteiller <[email protected]>

commit b064faf15c6349ffd5e4bf51b72960a77a7cfbf7
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Sep 22 11:56:19 2017 -0400

    Bugfix: deadlock in finalize may happen if the fault detector is turned off while the last ERA is ongoing

commit 024b90109cec452a249b0e2abee8b1c947141650
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Sep 22 11:54:46 2017 -0400

    Bugfix: thread safety needs to reload and recheck the proc when observer changes

commit 9eb779f8c8e9d3a53c9c159944fa83613be9e0e0
Author: George Bosilca <[email protected]>
Date:   Fri Sep 22 12:41:08 2017 -0400

    Support barriers with 1 proc communicators.
    Make sure the barrier supports being called with a
    communicator of size 1.

    Signed-off-by: George Bosilca <[email protected]>

commit f403bef6c2ec9f757881b13deaaed4c790b6bcf7
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Sep 21 16:15:46 2017 -0400

    Bugfix: reset the req_complete field when redoing a wait_sync after a failure (Issue #19)

commit 06bb8ed210288a0554897b872ee9a31c1766464a
Merge: 79efd24f ab68aced
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Sep 21 16:11:41 2017 -0400

    Merge remote-tracking branch 'origin/heads/master' into ulfm

commit 79efd24fe8f975f39b0d4bd61ee3e4dc2a99dd6d
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Sep 12 20:42:32 2017 -0400

    Bugfix: compilation problems --without-ft

commit 88bae3699c36b5e9aec90b36ca313ed9ca6a3f74
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Sep 12 14:05:00 2017 -0400

    Bugfix: simplified handling of --with-ft options

commit 9ec76f804313215fe8d43c73579d8e06f501cc20
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Sep 11 18:04:10 2017 -0400

    Remove the agreement in finalize.

commit e856ed3b54e93384d756fb791866ea8a55b8c68d
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Sep 7 17:22:59 2017 -0400

    Removing finalize deadlocks from known problems

commit 9d9aa8808500e3192633887c65f73d4d7e789abb
Author: George Bosilca <[email protected]>
Date:   Thu Sep 7 21:13:53 2017 +0000

    Update the README.

commit ea42a96e2a84814d9d8f35b285ff6479e7a87db9
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Sep 6 16:40:43 2017 -0400

    Fix: post-failure deadlocks in Finalize, and control FT with --disable-recovery rather than esotheric mca params.

commit d37ac65a2acedb70e55176267c1586a39baf62fd
Author: Aurelien Bouteiller <[email protected]>
Date:   Fri Sep 1 19:08:55 2017 -0400

    bugfix: finalize detector after all but 1 rank died.

commit 3eb197625d2f49d7da0fe268d044b0a6997e09f9
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Aug 31 16:53:50 2017 -0400

    cleanup: remove dead code in finalize

commit da229614428d6646ca5da3e91a93ba45f2be45f2
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Aug 31 16:20:39 2017 -0400

    bugfix: redo the wait_sync_mt when a global sync interrupts another request

commit 8285f9d3466919f8838609e3f054df229baa16c9
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Aug 31 16:14:36 2017 -0400

    Bugfix: prevent updating the failed_grp from multipe threads

commit 5a565247c83a20dfd684876acba1fa7633629ad0
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Aug 31 13:49:20 2017 -0400

    Bugfix in detector finalization

commit 6fcd853ff8b45ae599883d7bf76675ac969db52e
Merge: 42a3858d d06b989d
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Aug 29 02:37:54 2017 +0000

    Merged in abouteiller/ulfm2/feature/README (pull request #2)

    Put README.ULFM in markdown and make it a self-contained install/getting started

commit d06b989d277925f98a5575cf629b3c8c53c705ff
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 28 22:33:40 2017 -0400

    Put README.ULFM in markdown and make it a self-contained install/getting started

commit 42a3858df24fc3b2047e20b95797a3f2b80fef3b
Merge: 97070faf 1434c0e6
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 28 22:12:11 2017 +0000

    Merged in abouteiller/ulfm2/feature/README (pull request #1)

    Feature/README

commit 1434c0e61793f5b3e543fe6b0151e665c6e525f5
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 28 17:02:26 2017 -0400

    Update the README

commit 23798cf84e35a99735a237226bab5fd811809bfd
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Aug 10 11:25:04 2017 -0400

    Update README

commit d685eba8805da4617b604cdd7a1f72584537c7c4
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue Aug 8 13:43:41 2017 -0400

    Adding a README that's specific to ULFM
    It combines the old NEWS-ulfm from ULFM1
    INSTALL from Open MPI applies directly so no need for one

commit 97070faf87190faf6c50ea0a0a8557e94ec51775
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 28 15:47:51 2017 -0400

    topo aware FD does not observe same-node sibling

commit 938e0174959a3187037e1ac6356a9f6236fbc8ff
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 14 23:31:36 2017 -0400

    Reduce noise and some finalize conditions in comm_detector

commit 08c6f2d6e97ffe36389261edcbdd99f9a4ed38eb
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 14 23:29:27 2017 -0400

    Reduce verbosity for events that are "normal" in FT with CMA

commit 4fbd4d36933f2401330862429d420f8b179470ed
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 14 18:05:11 2017 -0400

    Fallback to pmix abort if ompi abort cannot be issued

commit baf523d73922b9e00c4c9f44b2de34283e0d2ebb
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 14 17:24:40 2017 -0400

    Orte reports ERR_UNREACH or ERR_PROC_ABORTED when it detects local failures, take both into account.

commit f4513c3458e44fcd0aa6db8dbd77c553572bbe2d
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon Aug 14 14:54:51 2017 -0400

    Bug in upstream: cannot call ompi_abort from a pmix cb

commit 8af800522eaf727c7f8ca8726cb7285765019483
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Aug 9 19:29:31 2017 -0400

    Re-enable the TOPO graph operations, and trigger an appropriate warning when FT is enabled at the same time

commit 1fc9c039585983eddf0c9cafc9176e253a82a26e
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Aug 9 19:14:41 2017 -0400

    Re-enable the RMA OSC operations, and trigger an appropriate warning when FT is enabled at the same time

commit 0214c850587a9dc4c1f18d086c6ae76c9c5fef3d
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Aug 9 18:34:08 2017 -0400

    Re-enable files for non-FT runs, and generate an appropriate warning about what happens when using files and failures happen

commit b20bd7c70eee582a93428e810e625bda829e975b
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Aug 9 11:48:48 2017 -0400

    Make --with-ft=mpi on by default on this fork

commit 59fca1bc961668069f79f14baffc708c65b80869
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jul 27 21:15:26 2017 -0400

    make the sync_wakeup work in multithreaded runs

commit 4f917d9863037e3522637350ebda4109a37a5c46
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu Jul 27 20:46:31 2017 -0400

    Proper cleanup of rdma registrations

commit ad86f26cb16fcd530d7a4f265d60a4f5dedb7f64
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed Jul 26 11:50:02 2017 -0400

    Restore --with-ft option and enable vader BTL from changes in upstream

commit 6f9abef3d444e05be2f664e4659f7fb4422e8350
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 25 09:03:57 2017 -0400

    Move proc_failed checks outside of the conditional check_args block

commit c9783c52ade667362194da90b1132ca5afbb58a5
Author: Aurelien Bouteiller <[email protected]>
Date:   Thu May 25 09:40:29 2017 -0400

    Add support for neighboring colls and other MPI 3.1 stuff

    cart/graph create

commit 35cb76303963ec83aeb27c2109b374163d57c0f6
Author: Aurelien Bouteiller <[email protected]>
Date:   Wed May 24 13:04:00 2017 -0400

    Make sure we do not initialize ERA and failure detector if FT is not requested; and fix a number of bugs when FT is not requesteed.

commit 12de8f950596b9f0d93d4aa301dbdbb0f0179b7c
Author: Aurelien Bouteiller <[email protected]>
Date:   Tue May 23 13:25:57 2017 -0400

    An error introduced during rebase

commit dbb86cb9cd68e7953a92728e2a9ee9fa15df3cd5
Author: Aurelien Bouteiller <[email protected]>
Date:   Mon May 22 13:31:55 2017 -0400

    Remove the opal_array comm_epoch as it is not needed anymore

commit 777b04cd67c8da1bbe95551ddd62b3bc1afd9a18
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Apr 18 15:05:22 2017 -0400

    Missing an extern

commit 471f3121ce75a4405d11349789c9c356ebe7b5c5
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Apr 17 23:54:42 2017 -0400

    The epoch overflow check must happen after the cid overflow check

commit ec2286eb1463f0486ad062b62ff505904e25a236
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Mar 27 16:35:16 2017 -0400

    Reconcile the FT coll components with the new coll initialization (coll. become coll->)

commit 9727e60ec6b9554f52532042fed928f389d6ac3c
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Mar 27 14:58:58 2017 -0400

    Update the nobuild list

commit f787b5d78cec90fd73e4fba888297fd936f9ae75
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri Feb 24 17:42:56 2017 -0500

    Adding a default no-build list for known problematic components.

commit ae557eedf91760e30bfbdd919156a756509c600d
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Feb 15 17:54:33 2017 -0500

    coll_base_module has been updated to 2_2_0

commit 5fd144f316023c37562fe4b09e8d266ebab613b0
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Jan 26 14:56:55 2017 -0500

    Importing change from ULFM1 94f1fb9 (malloc(0) in ERA)

commit 8d49d0ac9b20d8b519493aece25a61153ff275a2
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Jan 26 14:53:36 2017 -0500

    The pmix-errhandler integration is not completely ready for prime yet

commit 022897b5b589011b7c076759ca2a6b2b51c8ec86
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Jan 26 14:52:47 2017 -0500

    Convert the agreement in finalize to the new signature and stronger sync before turning off the detector

commit c67edf45b10b6fbae68d4bafc2c95a079932c703
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Nov 10 15:11:05 2016 -0500

    Permit interruption of the wait_sync in case of errors

commit c631c5599c2ec87588b2f3d3f059d83bf77b4f35
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Nov 7 15:30:12 2016 -0500

    Fix iagree by making the need to update of the failed_group a parameter

commit ebc714d3284aa99b94e65558e62bfa6ba01ac068
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Nov 3 14:42:08 2016 -0400

    Restoring the errhandler/errmgr interaction to  capture errors

commit 4fa09b9780c94b5766cf0d522fdc974352926da3
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Oct 31 15:37:22 2016 -0400

    cid_ft functions are operational again, shrink fixed.

commit 15b5a4d1b13730929c46b23da442f26a4b88cc48
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Oct 20 16:44:54 2016 -0400

    Make sure that the rbcast/detector tags are initialized before progressing the engine.

commit da21280dffb6ba0252de0d04e02566e0b96e7000
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri Oct 14 19:20:20 2016 -0400

    We can save an agreement in finalize if we take care of ignoring stray rbcast at this time

commit 355b4b0b2796bbcb0d2d4b6d07f16980346d4b0b
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 12 21:19:38 2016 -0400

    Make errors detected in NBC collectives complete the operation, and stop COMM_COLL requests

commit 8a88a81a83027f4aee446d77e0c074657f37a4b3
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 12 21:19:04 2016 -0400

    Some more REQUEST_COMPLETE fixes

commit 461d209343f51021557c1f6f11d05911c4134d5a
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 12 18:34:39 2016 -0400

    request_testall/some returns ERR_PROC_FAILED and REVOKED just like request_waitall/some (the mpi layer takes care of setting it to IN_STATUS again later)..

commit 7481f709f8d360f07f9082d13fae1c67c7b7219b
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 12 18:33:45 2016 -0400

    use REQUEST_COMPLETE in send_cancel

commit 79fd44e4f550076f57e4b101e7f347a47ac013dc
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 12 17:42:55 2016 -0400

    free_reqs does cancel the requests, so its replacement code should too.

commit f710a951e1c36a9663575533f05cd59b43f85a33
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 12 01:31:09 2016 -0400

    Put back epochs in cid allocation

commit 95e86ae7fb3fef16b0e5fdf2eff7b98eb4af28f1
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Oct 5 18:08:16 2016 -0400

    gen_cid must set req_mpi_object.comm

commit dfd07582e1c131aacc14ab3b81bde1f5745fce07
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Oct 4 22:44:45 2016 -0400

    rebase on master

commit 2da1b6bceb6ec52ac496f5a25101358e13db0892
Author: George Bosilca <[email protected]>
Date:   Mon May 9 12:25:01 2016 -0400

    Add FT to summary.

commit fce79759f7dcb58efa19b4948e6b66ada9807bb1
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri May 6 17:03:20 2016 -0400

    This hack has been committed by mistake

commit e507d83a66155df1fe5196228068f90a4132387f
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu May 5 10:37:50 2016 -0400

    Do the finalize in abort only if there were actual failures during the run

commit c1eb96a997625129ccc4a690892f2f9e742ac245
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue May 3 17:34:58 2016 -0400

    Using the new OPAL_ENABLE_THREAD_MULTI where applicable and removing some useless rmb()

    Using the new OPAL_ENABLE_THREAD_MULTI where applicable and removing some useless rmb()

    Using OPAL_ENABLE_MULTI_THREADS and removing some useless rmb()

commit d4a91a0a9607c10c52ee7a739754e10a16035a47
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue May 3 17:34:14 2016 -0400

    Fix a bug where the rank of immediate neighbors in the BMG where incorrectly computed

commit 4ed5a3779c8295b501cf589bf520843b8fcdc7c8
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue May 3 16:10:05 2016 -0400

    reinstate the abort in finalize, as the fix pushed by ralph is not always working

commit b31892772e4518e26603d4289fc3f0a57af2ef5f
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue May 3 16:06:44 2016 -0400

    We need to synchronize before removing the FD callbacks

commit 9dfe5dfc7200692330b939fcdb8965aac25b50fb
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue May 3 16:05:30 2016 -0400

    Keep searching for the next hop in the ring of the BMG when it is found dead during a comm

commit 9dad817b402de6c5a725de582cffb20f12f1ae54
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Apr 5 16:55:23 2016 -0400

    Various Cray XK fixes

commit 3921fbc38a9a913367f5c8c94682f4198aec6e06
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri Apr 1 13:53:32 2016 -0400

    Make the revoke ring more reliable
    Still not perfect as we do no reemit for failures detected after the initial post

commit de1a5ce9b13f87aa5367ca5305a389ae56f8822b
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri Apr 1 13:52:53 2016 -0400

    Adding a small injection facility to the interface (non-standard, for testing only)

commit b92d0997b567ef8e14abd4e76124568e049b6589
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Mar 31 14:50:17 2016 -0400

    Do not do extra stuff in Finalize when disable_ftmpi

commit 6cd0d7cbe3e1ed2c436c4f98ece4ca57e9242da3
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Mar 31 01:39:37 2016 -0400

    More thread safety in error reporting paths

commit 8b8b3c2c8d120e9aa5141338bcc8c95e43d79397
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 29 08:59:19 2016 -0400

    various debugging stuff

commit b6e6156d0026bf54897c20477887738462f5fbfd
Author: Aurélien Bouteiller <[email protected]>
Date:   Mon Mar 28 15:59:27 2016 -0400

    Move back these things in finalize to make sure they happen before we tear down BTL etc.

commit 6b12383734735130b6a31ee2d5af6b63bf8ae6bd
Author: Aurélien Bouteiller <[email protected]>
Date:   Fri Mar 25 08:40:12 2016 -0400

    fix the FD thread sync variable being optimized out in -O3

commit c76d0e968d2c61a3f630680f615293217f48b015
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 22 17:55:53 2016 -0400

    rdma based heartbeat now works

commit a3b35cc4d946cc7cf9a2af7afcfcb934d9a47a35
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 22 09:52:24 2016 -0400

    Adding RDMA based heartbeats

commit 7a2603b2fe55f8e69ce80c6dbaace5ac3d37f7b8
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 15 23:59:53 2016 -0400

    Adding a thread to the FD. This cause a race in add_procs.

commit 97f59ac7666d7430f2b50c16b411f7455552c3ba
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 15 17:41:36 2016 -0400

    Detector is complete w/o progress thread. The timer resolution is a bit too coarse and false suspicions are common...

commit 809fc3d6300b8333a99178290392ab6fe3b96116
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Mar 10 16:28:07 2016 -0500

    Adding the fd to this repo. missing the thread and libevent timeout triggers

commit e5932ef24746229ea0e2422e91aca3707bff9f32
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue May 3 11:52:43 2016 -0400

    use-mpi extensions should not have a lib.la

commit 93699550538bc8800ffcc1fcddd1f6de9d71839c
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Mar 17 13:45:26 2016 -0400

    Fixing some issues in MPI_THREAD_MULTIPLE enabled builds
    Reinstate the pmix_fence in finalize
    Remove some duplicate debug messages

commit ab485e166f60dccc233a13a4529a3e1012b4f7da
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Mar 10 16:58:52 2016 -0500

    Move the initialization/finalization of the revoke/rbcast etc in comm_init

    This initialization done elsewhere

commit c1744c01ad087a97cd6f03bf3fe6fa669acb049e
Author: Aurélien Bouteiller <[email protected]>
Date:   Thu Mar 10 16:28:51 2016 -0500

    Fix the global variable warning with the failed_group

commit 0822fe56dbe9dfcff02d5d784be033067332bf88
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 9 14:57:03 2016 -0500

    Silence warnings about failed TCP connections, which is a normal situation w/FT

commit 84596b8aba2972047d8afeef0e1c334df2b02e63
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 9 11:32:30 2016 -0500

    Make sure we do not try to cancel completed requests

commit d93e6289cc6f55a88649c77d1d9d4ffd581a6404
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 9 10:59:54 2016 -0500

    Make the CID collective tags part of the colletive tag namespace

commit 840cc828916574a2ab8b051bc688efeb7d6c27fc
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 9 10:58:14 2016 -0500

    Correctly promote ERR_PROC_FAILED_PENDING to PROC_FAILED for blocking operations and complete the request

commit 0b5fcf45acf860bd3bc74eb1503b37d85cc33aff
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 8 17:22:23 2016 -0500

    Fix a bug in intercomm_create and enable error returning from low level comms in all cases

commit 71c0c65699f095c5a0aa7cc4982e113c40075769
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 2 09:17:38 2016 -0500

    cumulative copyright update

commit b0138ff5141506a3ac1b6b060849bc9ba6b91df4
Author: Aurélien Bouteiller <[email protected]>
Date:   Wed Mar 2 01:45:56 2016 -0500

    Disable auto-cleanup in orte to better test survivability of MPI layer.
    orte finalize is broken.

commit 16dff489177807122a83f1e8b0004bbc7abf8ff5
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 1 23:11:54 2016 -0500

    The base logic for shrink_inter is there. As soon as cid_reduce_inter_ft is implemented it should work.

commit db2a955f388916e61321cb9bcf683750d5191a01
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 1 23:04:49 2016 -0500

    Fix a bug in shrink where the failed group was used partially uninitialized

commit 60469079c3bc2fdb08129174a5a2188b711a2418
Author: Aurélien Bouteiller <[email protected]>
Date:   Tue Mar 1 18:46:31 2016 -0500

    Cleanup cruft from jjh original prototype

commit 8187bcbd9139f9d52c27d14d3f86175b5edf9338
Author: Aurélien Bouteiller <bouteill…
shizhibao pushed a commit to shizhibao/ompi that referenced this issue Jan 17, 2021
Support allreduce non-contiguous datatype
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants