Python/concurrency defaults #1979

brenthuisman · 2022-09-15T14:21:49Z

Rename Python's create_context to make_context_shim, to align with arb::make_context
A special no-arg make_context_shim initializes a context with no MPI, and a arbenv::default_allocation(). This is a proc_allocation with arbenv::thread_concurrency() threads and a arbenv::default_gpu(). This no-arg constructor is not exposed and used only by the no-arg arbor.context() constructor and the single-arg (=recipe) arbor.simulation() constructor.
- This removes any effort for default (== local) use, but keeps proper defaults in the case users request MPI.
Binds a few more helper functions from arbenv in arbor.env, e.g. arbor.env.thread_concurrency() binds arbenv::thread_concurrency()
Update tests and tutorial texts.
~~[x] Align Pybind minver in python/CMakeLists.txt with elsewhere.~~
Docs updated/markup corrected to reflect changes in Feature: External Connectivity #2001
- doc/cpp/hardware.rst
- doc/concept/interconnectivity.rst
Update proc_allocation printer for 📊 Pin threads #2094
Expand arbor.config() with timestamp and python_lib_path.

…eads by default when no mpi comm is provided

thorstenhater · 2022-09-21T07:52:00Z

thanks that might save a few people a question on the help board. However, the tests run into a timeout.
I'll restart the job for now, but if it persists, please check what's going on.

brenthuisman · 2022-09-21T11:34:00Z

I restarted the job a few times now, it's the Python examples running on the Pip source install that does not terminate. Logs are not very helpful (we need to print script filenames to see which is actually not terminating)! I'll try to take a look today or tomorrow.

Update: offending script is plasticity.py. It doesn't do much but eat cycles if you give it more than one thread.

thorstenhater · 2022-09-21T12:22:48Z

Interesting ... can you figure out where exactly it spins?

thorstenhater · 2022-09-22T07:37:39Z

Ok, this appears exactly in this constellation

Python
threads > 1
calling sim.update after sim.__init__

and results in an uninterruptible hang in update. I kneejerk blame the threading implementation, but
it's a weird one.

Further: Spike recording or the number of connections is not relevant.

Yes, threading seems to be the cause in particular communicator.cpp:78

    for(int ix = 0; ix < gids.size(); ++ix) {
            auto gid = gids[ix];
            gid_infos[ix] = gid_info(gid, ix, rec.connections_on(gid));
    }

instead of

    threading::parallel_for::apply(0, gids.size(), thread_pool_.get(),
        [&](cell_size_type i) {
            auto gid = gids[i];
            gid_infos[i] = gid_info(gid, i, rec.connections_on(gid));
        });

Other threaded blocks here are ok.

thorstenhater · 2022-09-22T10:52:19Z

Ok, found the perpetrator. See #1984.

thorstenhater · 2022-10-05T08:11:57Z

This should be unblocked now.

boeschf · 2022-10-05T13:05:25Z

seems to hang in python examples again

Merge remote-tracking branch 'upstream/master' into python/concurrency_defaults

brenthuisman · 2022-10-11T09:27:41Z

Runs OK now locally!

thorstenhater

Nice! A few changes I think we should have before merging

Please pull out the pybind version bump.
Make C++ behave like we'd like Python to
Have 'avail_threads' even in presence of GPU

If you like you can also make this even more fancy by make it
play nice with MPI, but that's not neccessary to merge.

doc/python/hardware.rst

doc/tutorial/network_ring.rst

doc/tutorial/network_ring_mpi.rst

python/CMakeLists.txt

python/context.cpp

thorstenhater · 2022-10-18T07:27:16Z

python/simulation.cpp

@@ -210,7 +212,7 @@ void register_simulation(pybind11::module& m, pyarb_global_ptr global_ptr) {
                              const std::optional<arb::domain_decomposition>& decomp,
                              std::uint64_t seed) {
                try {
-                    auto ctx = ctx_ ? ctx_ : std::make_shared<context_shim>(arb::make_context());
+                    auto ctx = ctx_ ? ctx_ : std::make_shared<context_shim>(arb::make_context({arbenv::thread_concurrency(),-1}));


So, under the idea that Python and C++ APIs should co-evolve I am against this. Keep it at the default here and change the default in C++. But still keep the option to ask for 'avail_threads' explicitly.

So, also in cpp make avail_threads an option you mean?

Well, currently it is, right, via thread_concurrency?

Ah, sure, but it isn't a default (yet). I thought that was your request?

I was thinking that the best place to set this default (once) is in proc_allocation, but no, because it does not know about MPI. So, the default context constructors should set num_threads to arbenv::thread_concurrency(), if MPI is not enabled.

In the interest of keeping things similar, should in Python the variable be called thread_concurrency, and be a member of arbor? e.g. arbor.context(arbor.thread_concurrency(), None)? I like it!

Should be in arbor.env, yes, but that's the vision.
The C++ c'tor could be this:

context(std::optional<int> threads={}, std::optional<int> gpu_id={}, std::optional<MPI_Comm> comm={}) { if (!threads && !comm { } else if (!threads && comm) { } else if ...

I know that gpu_id=-1 is used instead of optional and
that we cannot mention the type MPI_Comm w/o MPI=ON,
but this is type we should have in mind.

Unlike in Python, there is no explicit constructor in C++, it all goes through proc_allocation: execution_context(const proc_allocation& resources = proc_allocation{}). Also, having this in C++ would make (all of) Arbor dependent on arbenv.

Since there is no way to currently do sth like auto ctx = arbor::context{gpu_id=1}, we could set proc_allocation.num_threads by changing the non-mpi constructor to execution_context(const proc_allocation& resources = proc_allocation{arbenv::thread_concurrency(),-1}); To the extent that execution_context construction maps to Python, it sets num_threads under the same circumstances.

Unlike in Python, there is no explicit constructor in C++, it all goes through proc_allocation: execution_context(const proc_allocation& resources = proc_allocation{}).

You could call proc_allocation a set of kwargs. Once we have achieved C++20 in its full glory this becomes
a thing

execution_context({.threads=4, .comm=MPI_COMM_WORLD})

Also, having this in C++ would make (all of) Arbor dependent on arbenv.

That's fine, to a certain extent, arbenv is a library made for Arbor to interact with the environment.
As long we do not pull it into the deepest layers.

Also, nothing stopping us from making new, purpose-built c'tors, eg

context from_mpi_one_thread_per_logical_core(MPI_Comm& comm, int gpu_id=-1); context from_mpi_one_thread_per_physical_core(MPI_Comm& comm, int gpu_id=-1); // ...

Also, if proc_allocation is really too much of a hassle to have around, feel free to remove it.

I've updated the code as per my earlier suggestion in the interested of merging this now.

Removal of proc_allocation is fine by me, but I'm not a user. The C++ tests do, so that's why I have not suggested removal.

MPI cleverness: to me that can be seen as a separate issues (the PR was initially written to get a better default for local use). I'll ask in the channel if users are missing something.

python/example/network_ring_gpu.py

thorstenhater · 2022-10-18T07:34:36Z

doc/python/hardware.rst

-            The number of threads available locally for execution. Must be set to 1 at minimum. 1 by default.
-            Passing ``"avail_threads"`` (as string) will query and use the maximum number of threads the system makes available.
-
+            The number of threads available locally for execution. Defaults to the maximum number of threads the system makes available if gpu_id and mpi are not set, else defaults to 1.


Considering the rest of the change set, how about digging in a bit deeper and come up with sensible defaults for all cases? It's not a requirement to merge, but if we are going to put it in eventually, why not in 0.8?
I don't know whether arbenv has the proper calls, but

MPI: figure out how many threads a task is allowed to use (using hwloc maybe) and set that when handed 'avail_threads'. Other than that there's OS-specific calls (which hwloc uses), we would need to consider MacOS and Linux.

GPU: can just use 'avail_threads', it speeds up construction a bit. Independent of the change above. If 'avail_threads' receives super-charged capabilities wrt MPI, it'll benefit, if not, then not.

MPI+GPU: do whatever MPI does. Likely we are using one task per GPU.

My initial idea was to only touch the default constructor, to provide the best experience for novice users.

If you'd like to touch more than the default constructor: we could initialize threads to thread_concurrency. It's straightforward to understand for the user, but in particular when MPI is used, may result in surprising behavior.

Well, the experienced folks appreciate less typing, too. So, yes, thread_concurrency is fine, but it's interaction
with MPI must be understood and documented. A sensible approach could be to try and use thread_concurrency/MPI_size. BUT if we are on a HPC cluster, which should be 90% of cases when MPI is in play, someone might have set up a thread mask accordingly already. Some time ago we looked at OMP_NUM_THREADS which is used and exported by SLURM and ilk.

MPI config is it's own world of expertise, and I wager normal users stick to 1 thread per rank (I have never seen or done differently, unless it was you trying to squeeze out every last drop).

So, my pref is then to default threads to thread_concurrency unless mpi is set, in which case it stays 1. If cleverer people than I come up with a thought out proposal for MPI then I'm happy to consider it. Agree?

As I said, I'd be fine with merging this part as is, but consider the extra effort for making it extra-nice for the user interesting. I think the common patterns are

No MPI, 1 thread/(logical)core

with GPU

without GPU

MPI, still 1 thread per (logical)core but now each task gets handed 1/nth. N is the number of tasks according to

1 task per socket

1 task GPU

So, the approach is to compute

the (logical)core count: arbenv::cores(logical=true|false)

the number of tasks: handed to you via Comm_size, defaults to 1

and context or thread_concurrency could be instructed whether to distribute threads over
tasks or not.

See above's note on MPI.

As I said, it'd been nice to have rather than critical. I am personally fine with typing out these, but as we were sanding off some rougher edges it offered an easy win.

I'd want to test this out, but for this week I don't think I'll have the time. Making the release will almost certainly eat up the rest of my hours, and I prefer not to postpone it further. Maybe a small discussion in the team meet about removing proc_alloc and have these new constructors?

As said, optional, and only if proc_alloc blocks you from achieving more.

brenthuisman · 2022-10-18T08:19:46Z

C++ was left out, since I assumed (some people in the past certainly argued so) that there should be no magic. I'm OK with keeping them aligned, just want to make sure this has changed.

thorstenhater · 2022-10-18T09:30:59Z

For me alignment beats the no_magic paradigm, so if there must be magic, it should be working the same always.
One might argue that under this requirement it is not magic anymore.

… mpi is set. add documentation.

…ncurrency threads. Update docs and tests.

brenthuisman · 2022-10-25T10:34:34Z

Hmm, I see a CI build error pop up that I didn't see locally:

CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "arbor" of type SHARED_LIBRARY
    depends on "arborenv" (weak)
  "arborenv" of type SHARED_LIBRARY
    depends on "arbor" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.

So, how do we handle this for the shared variant? Do we need that variant for anything (I don't recall noticing it before)?
The Python build test does not show the error.

…arg construction.

brenthuisman · 2023-04-20T15:09:14Z

Updated, changed. PR message reflects the major changes.

thorstenhater

Shaping up to a nice addition, some comments.

thorstenhater · 2023-05-04T08:25:20Z

doc/concepts/interconnectivity.rst

This is largely unrelated formatting changes? Fine, but confusing to review.

True-ish, I had to touch hardware.rst and discovered some things neglected/incorrect in #2001.

thorstenhater · 2023-05-04T08:26:05Z

test/unit/test_spikes.cpp

@@ -214,6 +214,7 @@ TEST(SPIKES_TEST_CLASS, threshold_watcher_interpolation) {
    dict.set("mid", arb::ls::on_branches(0.5));

    arb::proc_allocation resources;
+    resources.num_threads = 1;


Unless testing the defaults, I like tests be explicit. default_concurrency() would also be fine.

thorstenhater · 2023-05-04T08:26:22Z

test/unit/test_domain_decomposition.cpp

@@ -152,6 +152,7 @@ namespace {
 TEST(domain_decomposition, homogenous_population)
 {
    proc_allocation resources;
+    resources.num_threads = 1;


Same as below.

thorstenhater · 2023-05-04T08:26:28Z

test/unit/test_domain_decomposition.cpp

@@ -219,6 +220,7 @@ TEST(domain_decomposition, homogenous_population)
 TEST(domain_decomposition, heterogenous_population)
 {
    proc_allocation resources;
+    resources.num_threads = 1;


Same as below.

thorstenhater · 2023-05-04T08:29:46Z

python/example/network_ring.py

-# (12) Create an execution context using all locally available threads and simulation
-ctx = arbor.context(threads="avail_threads")
-sim = arbor.simulation(recipe, ctx)
+# (12) Create a simulation using all locally available threads


Yeah, it's short, yet less clear what's going on.

Suggested change

# (12) Create a simulation using all locally available threads

# (12) Create a simulation using the default settings:

# - Use all threads available

# - Use round-robin distribution of cells across groups with one cell per group

# - No GPU

# - No MPI

# - Empty label dictionary

# Other constructors of simulation can be used to change all of these.

Good point. Slightly modified in upcoming commit. Label dict left out, it's set elsewhere.

thorstenhater · 2023-05-04T08:33:35Z

python/conversion.hpp

@@ -66,6 +66,17 @@ std::optional<T> py2optional(pybind11::object o, const char* msg) {
    return o.is_none()? std::nullopt: std::optional<T>(std::move(value));
 }

+template <typename T, typename F>


template <typename T, typename F> std::optional<T> optional_when(T&& o, F&& pred) { if (pred(std::forward(T)) { return {std::forward(T)}; } else { return {}; } }

A slightly more descriptive name

Do not construct the optional if not needed

Play nicely with rvalue refs, lvalue ref, and values.

thorstenhater · 2023-05-04T08:38:57Z

python/context.cpp

@@ -136,8 +124,10 @@ void register_contexts(pybind11::module& m) {
             "bind_procs"_a=false,
             "bind_threads"_a=false,
             "Construct an allocation with arguments:\n"
-             "  threads: The number of threads available locally for execution. Must be set to 1 at minimum. 1 by default.\n"
-             "  gpu_id:  The identifier of the GPU to use, None by default.\n")
+             "  threads:      The number of threads available locally for execution. Must be set to 1 at minimum. 1 by default.\n"


Might this be a symbolic constant or string? Such as 'avail_threads'?

Available as arbor.env.default_concurrency().

thorstenhater · 2023-05-04T08:39:28Z

python/context.cpp

-    arb::proc_allocation allocation() const {
-        return arb::proc_allocation(num_threads, gpu_id.value_or(-1), bind_procs, bind_threads);
+void proc_allocation_shim::set_num_threads(unsigned threads) {
+    if (0==threads) {


Don't we throw this from C++?

arb:::proc_allocation does not do any checking. Nor does anything else for that matter. Only the pyobject ever did.

thorstenhater · 2023-05-04T08:40:24Z

python/CMakeLists.txt

-set_property(SOURCE config.cpp PROPERTY COMPILE_DEFINITIONS ARB_DATA="${CMAKE_INSTALL_DATAROOTDIR}"    APPEND)
-set_property(SOURCE config.cpp PROPERTY COMPILE_DEFINITIONS ARB_CXX_COMPILER="${CMAKE_CXX_COMPILER}"   APPEND)
-set_property(SOURCE config.cpp PROPERTY COMPILE_DEFINITIONS ARB_PREFIX="${CMAKE_INSTALL_PREFIX}"       APPEND)
+set_property(SOURCE config.cpp PROPERTY COMPILE_DEFINITIONS ARB_BINARY="${CMAKE_INSTALL_BINDIR}"         APPEND)


What changed here?

Forgot to list in PR message (added): arbor.config() now returns timestamp and python_lib_path. The prefix need not be (and for pip installs, isn't, it'll be a venv typically) where CMake installed the package.

thorstenhater · 2023-05-04T08:44:07Z

arbor/execution_context.hpp

@@ -25,7 +25,7 @@ struct ARB_ARBOR_API execution_context {
    task_system_handle thread_pool;
    gpu_context_handle gpu;

-    execution_context(const proc_allocation& resources = proc_allocation{});
+    execution_context(const proc_allocation& resources = proc_allocation{1,-1});


I'd like a symbolic constant here, ie adding

constexpr static int gpu_nil_id = -1;

to context.hpp

thorstenhater

One weirdness ... Other than that good.

thorstenhater · 2023-05-09T14:51:20Z

python/example/network_ring.py

+# (12) Create a simulation using the default settings:
+# - Use all threads available
+# - Use round-robin distribution of cells across groups with one cell per group
+# - Use GPU if present


Do we? Boy we are smart ;)

brenthuisman added 3 commits September 15, 2022 15:50

Simplify Python context and simulation creation: default to avail_thr…

d7611b0

…eads by default when no mpi comm is provided

better defaults

d92f9d7

update docs, examples

153615e

brenthuisman requested a review from thorstenhater September 15, 2022 14:21

brenthuisman marked this pull request as draft September 15, 2022 14:40

fix

29bae21

brenthuisman marked this pull request as ready for review September 15, 2022 14:55

brenthuisman added 2 commits October 11, 2022 11:21

Merge upstream

998c543

Merge remote-tracking branch 'upstream/master' into python/concurrency_defaults

Require correct version of Pybind (see spack/package.py).

0dc9f5b

brenthuisman requested a review from boeschf October 11, 2022 09:28

thorstenhater requested changes Oct 18, 2022

View reviewed changes

brenthuisman added 3 commits October 19, 2022 15:01

review so far

0b54dc6

expose arbenv::thread_concurrency as arbor.env.thread_concurrency

e81a0d5

default arbor.context.threads to arbor.env.thread_concurrency, unless…

44f3e00

… mpi is set. add documentation.

brenthuisman requested a review from thorstenhater October 24, 2022 13:56

Change C++ MPI-less execution_context to default to arbenv::thread_co…

642cfe3

…ncurrency threads. Update docs and tests.

brenthuisman added 3 commits October 25, 2022 17:12

Merge master, fix sphinx issues

f11fd61

fix linenos

bf6ab94

todo: make cpps proc_alloc.gpu_id optional

c023eae

cmake matrix test

b661fd1

brenthuisman force-pushed the python/concurrency_defaults branch from fdbc3e2 to b661fd1 Compare April 19, 2023 10:32

brenthuisman added 13 commits April 19, 2023 12:41

older pip works?

4d84ea6

Test: does the number of threads matter?

8c2bf51

Test the correct workflow...

f69bea4

pip strikes everywhere

bcad95d

whoops

a6930d2

TEST: that wasnt it

acc7dd7

Merge master

88cba14

Expand arbor.config() with timestamp and python_lib_path

f4f3ab6

Only change is here

375078b

OK, but why?

a8b0b66

Return correct gpu_id type

7003e84

Test some more

5d881da

arbor.env return correct types. Hopefully solve make_context_shim no-…

3071736

…arg construction.

brenthuisman marked this pull request as ready for review April 20, 2023 15:08

brenthuisman requested review from thorstenhater and boeschf and removed request for thorstenhater and boeschf April 20, 2023 15:10

Update Python API docs

820cb19

thorstenhater requested changes May 4, 2023

View reviewed changes

brenthuisman added 3 commits May 4, 2023 13:36

more explanation of defaults in ring example

9150494

More descriptive name: optional_when

09f5593

fix optional_when, some more review

29e822d

brenthuisman requested a review from thorstenhater May 4, 2023 14:18

thorstenhater approved these changes May 9, 2023

View reviewed changes

brenthuisman merged commit 38453f3 into arbor-sim:master May 10, 2023

brenthuisman deleted the python/concurrency_defaults branch May 10, 2023 07:39

Python/concurrency defaults #1979

Python/concurrency defaults #1979

Conversation

brenthuisman commented Sep 15, 2022 • edited Loading

thorstenhater commented Sep 21, 2022

brenthuisman commented Sep 21, 2022 • edited Loading

thorstenhater commented Sep 21, 2022

thorstenhater commented Sep 22, 2022 • edited Loading

thorstenhater commented Sep 22, 2022

thorstenhater commented Oct 5, 2022

boeschf commented Oct 5, 2022 • edited Loading

brenthuisman commented Oct 11, 2022

thorstenhater left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thorstenhater Oct 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brenthuisman Oct 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brenthuisman commented Oct 18, 2022

thorstenhater commented Oct 18, 2022

brenthuisman commented Oct 25, 2022

brenthuisman commented Apr 20, 2023

thorstenhater left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brenthuisman May 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brenthuisman May 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brenthuisman May 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thorstenhater left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brenthuisman commented Sep 15, 2022 •

edited

Loading

brenthuisman commented Sep 21, 2022 •

edited

Loading

thorstenhater commented Sep 22, 2022 •

edited

Loading

boeschf commented Oct 5, 2022 •

edited

Loading

thorstenhater Oct 24, 2022 •

edited

Loading

brenthuisman Oct 24, 2022 •

edited

Loading

brenthuisman May 4, 2023 •

edited

Loading

brenthuisman May 4, 2023 •

edited

Loading

brenthuisman May 4, 2023 •

edited

Loading