Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opal/mca/threads: fix Argobots support #8767

Merged
merged 5 commits into from
Apr 8, 2021

Conversation

shintaro-iwasaki
Copy link
Contributor

@shintaro-iwasaki shintaro-iwasaki commented Apr 5, 2021

This PR improves the MCA/OPAL/threads/argobots support.

Except for the first commit, this PR does not even touch anything related to the default thread (=Pthreads).

The first commit fixes the include header in opal/mca/pmix/pmix_internal.h. Pthreads' opal/mca/threads/threads.h happens to indirectly include opal_list.h while other threading versions (e.g., Argobots) do not. Since opal/mca/pmix/pmix_internal.h uses opal_list_item_t, pmix_internal.h should directly include opal_list.h.

The second commit introduces Argobots 1.1 static mutex/cond initializers, which has been discussed in #6578.
The third commit implements a missing function.
The fourth commit removes unused types and variables that are left somehow.
[edit] The fifth commit adds configure-time Argobots 1.1 detection.


This patch is related #8479, but more comprehensive. If this PR is merged first, #8479 might need to be rebased. If #8479 is merged first, I will rebase this PR again.

This patch passed git clang-format. I have tested the Argobots version on my local machine.

@ompiteam-bot
Copy link

Can one of the admins verify this patch?

@shintaro-iwasaki shintaro-iwasaki changed the title fix MCA/OPAL/threads/argobts support. opal/mca/threads: fix Argobots support Apr 5, 2021
@jjhursey
Copy link
Member

jjhursey commented Apr 6, 2021

ok to test

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments/questions inline.

@devreal
Copy link
Contributor

devreal commented Apr 6, 2021

Ahh, also: since you are relying on Argobots 1.1 now would it be possible to check for the Argobots version during configure?

"opal_list.h" happens to be included by "tsd.h" in mca/threads when a
thread type is POSIX, but "mca/threads/threads.h" of other thread types
do indirectly not include tsd.h and thus no "opal_list.h".

"pmix_internal.h" directly uses opal_list_item_t and thus should include
"opal_list.h" in its header.

Signed-off-by: Shintaro Iwasaki <[email protected]>
Argobots 1.1 started to support static mutex/cond initialization.  This
patch introduces them to clean up the current implementation.

Signed-off-by: Shintaro Iwasaki <[email protected]>
This patch removes unused variables and types.

Signed-off-by: Shintaro Iwasaki <[email protected]>
@shintaro-iwasaki
Copy link
Contributor Author

@devreal Thank you. In addition to your comments on individual files, I also added a configure-time Argobots 1.1 check (please see the last commit). I'd appreciate it if you could review this PR again.

@shintaro-iwasaki
Copy link
Contributor Author

Regarding @devreal's comment:

This code seems identical to the code in pthreads/threads_pthreads_wait_sync.c. Is it possible to move this code into a common subdirectory?

Yes I think it's a good idea. I would like to, and at the same time, I'd like to restructure the current mca/threads code if okay. The current design I am following is "do not modify anything related to pthreads", but as @devreal pointed out, the current one has obviously a lot of duplications. I'd like to refactor all including a bit weird abstractions.

Idea of refactoring

For example, I don't understand why ompi_sync_wait_mt() is neither opal_thread_sync_wait_mt() nor opal_sync_wait_mt(). In my understanding, the namespace should be:

  1. Everything public (except for conventional opal_mutex or opal_cond) should have a prefix of opal_thread_
  2. Everything public should be "declared" in threads/*.h, not in individual threads/{pthreads, argobots, ...}/*.h` except for macros.
    • Even macros, should I do #ifndef-#define in threads/*.h and #undef-#define in threads/pthreads/*.h?
  3. Everything that is not public but must be in headers (e.g., inline function) should have a special prefix to avoid global namespace pollution.

I will also remove the duplication as much as possible.

However, it also totally makes sense to stick to the current code as much as possible rather than refactoring since the current code works. This PR is not the first time, but I'm quite new to the Open MPI development and cannot make any design decisions. I'm happy to work on this issue once there is a consensus; I will create another PR for this. I don't want to open a PR that is known to be rejected for the maintenance reason.


It depends on how much refactoring is allowed, but if we should be conservative, I would like to keep the duplications to prevent any potential interference with the pthreads layer.

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@devreal
Copy link
Contributor

devreal commented Apr 7, 2021

@shintaro-iwasaki From what I can see, the wait_sync implementations in all three implementations rely on three building blocks the threading implementations already provide: mutexes, conditional variables, and yield. We could just move the implementation of wait_sync into base and make use of the basic operations there. Since the thread package is selected at compile-time there should be no overhead incurred. For now, I wouldn't meddle with the naming, that could be something for later.

@shintaro-iwasaki
Copy link
Contributor Author

@devreal Thanks for your review and approval! Can we create another PR for duplication removal? I mean, can I ask you to merge this PR before that PR? I will create that PR this week (after this PR is merged to avoid conflicts).

I understand your idea. I agree. Yes, the current wait_sync implementation is almost the same and we should move them to threads/base. If naming is not a big problem now, I will keep it.


I'd like to separate the PR since this change is not very trivial. Specifically, those implementations are not exactly the same, for example, the following struct directly uses ABT_mutex.
https://github.com/open-mpi/ompi/blob/master/opal/mca/threads/argobots/threads_argobots_wait_sync.h#L34-L35

We might need other abstractions for those raw mutex/condition variable, but it would raise concerns about performance, portability, or maintenance...

@devreal
Copy link
Contributor

devreal commented Apr 7, 2021

Absolutely, let's get this one merged and the n work on the duplication removal 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants