Allocating multiple blocks to one mpi rank in LB #5026

hidekb · 2025-01-10T11:00:51Z

Description of changes:

LB CPU now supports allocating multiple blocks to one mpi rank
- The default block number per mpi rank is 1
- The block number per mpi rank is controlled by the argument blocks_per_mpi_rank for LBFluidWalberla

… blocks pre mpi rank

maintainer/benchmarks/lb.py

RudolfWeeber · 2025-01-13T14:19:33Z

src/script_interface/walberla/LBFluid.cpp

+  if (blocks_per_mpi_rank != Utils::Vector3i{{1, 1, 1}}) {
+    throw std::runtime_error(
+        "GPU architecture PROHIBITED allocating many blocks to 1 CPU.");
+  }


"Using more than one block per MPI rank is not supported for GPU LB" (but why, actually?)

how about "GPU LB only uses 1 block per MPI rank"?

RudolfWeeber · 2025-01-13T14:25:52Z

src/walberla_bridge/src/BoundaryPackInfo.hpp

@@ -96,7 +96,7 @@ class BoundaryPackInfo : public PackInfo<GhostLayerField_T> {
    WALBERLA_ASSERT_EQUAL(bSize, buf_size);
 #endif

-    auto const offset = std::get<0>(m_lattice->get_local_grid_range());
+    auto const offset = to_vector3i(receiver->getAABB().min());


Wouldn't it be better to have functions for this in the Lattice class, so they can be used by EK as well?
After all, LB and EK probably need to agree about both, the mpi and the block decompositoin?

I removed the free function to_vector3i and added the member function of LatticeWalberla LatticeWalberla::get_block_corner.

RudolfWeeber · 2025-01-13T14:27:53Z

src/walberla_bridge/src/LatticeWalberla.cpp

  }

  auto constexpr lattice_constant = real_t{1};
-  auto const cells_block = Utils::hadamard_division(grid_dimensions, node_grid);


cells_per_block?

I changed cells_block to cells_per_block.

RudolfWeeber · 2025-01-13T14:29:03Z

src/walberla_bridge/src/LatticeWalberla.cpp

      // number of cells per block in each direction
      uint_c(cells_block[0]), uint_c(cells_block[1]), uint_c(cells_block[2]),
      lattice_constant,
      // number of cpus per direction
      uint_c(node_grid[0]), uint_c(node_grid[1]), uint_c(node_grid[2]),
      // periodicity
-      true, true, true);
+      true, true, true,
+      // keep global block information


What does this do/mean?

If "keep global block information" is true, each process keeps information about remote blocks that reside on other processes.

src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp

RudolfWeeber · 2025-01-13T15:07:01Z

testsuite/python/lb_couette_xy.py

+    return v * u
+
+
+LB_PARAMS = {'agrid': 1.,


Pls avoid 1s in unit tests, as wrong exponents don't get cauth.

I moved this function.

RudolfWeeber · 2025-01-13T15:15:06Z

Thank you. Looks good in general.
One thing worth looking into is the cell interval business. It looks like there are a lot of very similar loops in the getters and setters for slices. Would it it be possible to pull that out into a function, which then is called with different lambdas for the individual cases?
@jngrad could you maybe take a look?

hidekb · 2025-01-15T16:55:20Z

Thank you. Looks good in general. One thing worth looking into is the cell interval business. It looks like there are a lot of very similar loops in the getters and setters for slices. Would it it be possible to pull that out into a function, which then is called with different lambdas for the individual cases?

I rewrote the code related to the getters and setters for slices. Similar loops are pulled into a function which calls different lambdas for the individual cases.

maintainer/benchmarks/lb.py

jngrad · 2025-01-15T16:31:53Z

maintainer/benchmarks/lb_weakscaling.py

+
+"""
+Benchmark Lattice-Boltzmann fluid + Lennard-Jones particles.
+"""


Wouldn't it be more sustainable if we only had one LB benchmark file to maintain? argparse is quite flexible, surely we can come up with a way to select strong vs. weak scaling with command line options?

I unified lb.py and lb_weakscaling.py by adding option --weak_scaling to argparse for lb.py

src/python/espressomd/lb.py

jngrad · 2025-01-15T16:36:21Z

src/script_interface/walberla/LBFluid.cpp

+  auto const blocks_per_mpi_rank = get_value_or<Utils::Vector3i>(
+      params, "blocks_per_mpi_rank", Utils::Vector3i{{1, 1, 1}});


Here and elsewhere, do we really need a default value, considering we already provide a default value in the python class?

I added default value for the python class LatticeWalberla. And blocks_per_mpi_rank in LBFluid is settle by get_value<Utils::Vector3i>(m_lattice->get_parameter("blocks_per_mpi_rank")).

jngrad · 2025-01-15T16:36:47Z

src/script_interface/walberla/LBFluid.cpp

+  if (blocks_per_mpi_rank != Utils::Vector3i{{1, 1, 1}}) {
+    throw std::runtime_error(
+        "GPU architecture PROHIBITED allocating many blocks to 1 CPU.");
+  }


how about "GPU LB only uses 1 block per MPI rank"?

src/walberla_bridge/src/utils/types_conversion.hpp

testsuite/python/lb.py

testsuite/python/lb_couette_xy.py

jngrad · 2025-01-15T16:55:19Z

testsuite/python/lb_mass_conservation.py

@@ -41,7 +41,7 @@ class LBMassCommon:

    """Check the lattice-Boltzmann mass conservation."""

-    system = espressomd.System(box_l=[3.0, 3.0, 3.0])
+    system = espressomd.System(box_l=[6.0, 6.0, 6.0])


Several mass conservation tests are know to take a lot of time, especially in code coverage builds. The runtime can significantly increase when multiple CI jobs run on the same runner, which is then starved of resources. This change may potentially increase the test runtime by a factor 8. Can you please confirm the runtime did not significantly change in the clang and coverage CI jobs, compared to the python branch?

I changed the box_l from 6 to 4 to reduce the runtime. For testing with blocks_per_mpir_rank(i.e. [1,1,2]), box_l = 4 is needed.

testsuite/python/lb_shear.py

src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp

Hideki Kobayashi and others added 10 commits November 6, 2024 12:04

Annotation for pure fluid integration

e76ecf5

Annotation for pure fluid integration 2nd

067d3fa

Allocating many blocks to mpi rank

6392e3c

Add test script about domain decomposition for LBM

9e7f3c9

Added unit_tests and python integration tests for allocating multipul…

e3ee829

… blocks pre mpi rank

Deleted unnecessary comment

0135af7

Formatting codes for allocating multiple blocks to mpi rank

0793276

Merge branch 'python' into scale-lbm

0c33a15

Formatting codes

d40edca

Formatting codes for git style

75e9e17

hidekb marked this pull request as draft January 10, 2025 18:01

hidekb added 2 commits January 10, 2025 19:11

Solve the conflict

e8d0b1e

Formatting codes and Fix benchmarks script

281abc2

RudolfWeeber reviewed Jan 13, 2025

View reviewed changes

hidekb added 4 commits January 15, 2025 15:57

Responding to Reviews

a55c6bf

Formatting codes

cb1561c

Formatting codes for clang-sanitizer

42a24e7

Fortting codes in git style

e26d439

jngrad reviewed Jan 15, 2025

View reviewed changes

hidekb added 4 commits January 17, 2025 12:38

Responding reviews

a91eaf5

Formatting codes

a509615

Fixed problems with debuging option

2d221c1

Formatting codes for clang-sanitizer

6091226

jngrad reviewed Jan 17, 2025

View reviewed changes

src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp Outdated Show resolved Hide resolved

jngrad reviewed Jan 17, 2025

View reviewed changes

src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp Outdated Show resolved Hide resolved

hidekb added 3 commits January 17, 2025 20:40

Responding to Reviews

a6bac85

Removing unneccessary comments

1b0e7c1

Formatting codes for git-style

9d9bd13

hidekb and others added 4 commits January 20, 2025 19:38

Avoiding unintentional errors

a806a06

Narrowing the scope of integerisation function

f3a1520

Refactoring

2229672

Merge branch 'python' into scale-lbm

75d4564

hidekb marked this pull request as ready for review January 23, 2025 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allocating multiple blocks to one mpi rank in LB #5026

Allocating multiple blocks to one mpi rank in LB #5026

hidekb commented Jan 10, 2025 •

edited

Loading

RudolfWeeber Jan 13, 2025

jngrad Jan 15, 2025

RudolfWeeber Jan 13, 2025

hidekb Jan 22, 2025

RudolfWeeber Jan 13, 2025

hidekb Jan 15, 2025

RudolfWeeber Jan 13, 2025

hidekb Jan 13, 2025

RudolfWeeber Jan 13, 2025

hidekb Jan 15, 2025

RudolfWeeber commented Jan 13, 2025

hidekb commented Jan 15, 2025

jngrad Jan 15, 2025

hidekb Jan 17, 2025

jngrad Jan 15, 2025

hidekb Jan 17, 2025

jngrad Jan 15, 2025

jngrad Jan 15, 2025

hidekb Jan 17, 2025

		auto const blocks_per_mpi_rank = get_value_or<Utils::Vector3i>(
		params, "blocks_per_mpi_rank", Utils::Vector3i{{1, 1, 1}});

Allocating multiple blocks to one mpi rank in LB #5026

Are you sure you want to change the base?

Allocating multiple blocks to one mpi rank in LB #5026

Conversation

hidekb commented Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RudolfWeeber commented Jan 13, 2025

hidekb commented Jan 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hidekb commented Jan 10, 2025 •

edited

Loading