-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocating multiple blocks to one mpi rank in LB #5026
base: python
Are you sure you want to change the base?
Conversation
… blocks pre mpi rank
if (blocks_per_mpi_rank != Utils::Vector3i{{1, 1, 1}}) { | ||
throw std::runtime_error( | ||
"GPU architecture PROHIBITED allocating many blocks to 1 CPU."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Using more than one block per MPI rank is not supported for GPU LB" (but why, actually?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "GPU LB only uses 1 block per MPI rank"?
@@ -96,7 +96,7 @@ class BoundaryPackInfo : public PackInfo<GhostLayerField_T> { | |||
WALBERLA_ASSERT_EQUAL(bSize, buf_size); | |||
#endif | |||
|
|||
auto const offset = std::get<0>(m_lattice->get_local_grid_range()); | |||
auto const offset = to_vector3i(receiver->getAABB().min()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to have functions for this in the Lattice class, so they can be used by EK as well?
After all, LB and EK probably need to agree about both, the mpi and the block decompositoin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the free function to_vector3i
and added the member function of LatticeWalberla LatticeWalberla::get_block_corner
.
} | ||
|
||
auto constexpr lattice_constant = real_t{1}; | ||
auto const cells_block = Utils::hadamard_division(grid_dimensions, node_grid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cells_per_block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed cells_block to cells_per_block.
// number of cells per block in each direction | ||
uint_c(cells_block[0]), uint_c(cells_block[1]), uint_c(cells_block[2]), | ||
lattice_constant, | ||
// number of cpus per direction | ||
uint_c(node_grid[0]), uint_c(node_grid[1]), uint_c(node_grid[2]), | ||
// periodicity | ||
true, true, true); | ||
true, true, true, | ||
// keep global block information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do/mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If "keep global block information" is true, each process keeps information about remote blocks that reside on other processes.
return v * u | ||
|
||
|
||
LB_PARAMS = {'agrid': 1., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls avoid 1s in unit tests, as wrong exponents don't get cauth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this function.
Thank you. Looks good in general. |
I rewrote the code related to the getters and setters for slices. Similar loops are pulled into a function which calls different lambdas for the individual cases. |
|
||
""" | ||
Benchmark Lattice-Boltzmann fluid + Lennard-Jones particles. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more sustainable if we only had one LB benchmark file to maintain? argparse is quite flexible, surely we can come up with a way to select strong vs. weak scaling with command line options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I unified lb.py and lb_weakscaling.py by adding option --weak_scaling to argparse for lb.py
auto const blocks_per_mpi_rank = get_value_or<Utils::Vector3i>( | ||
params, "blocks_per_mpi_rank", Utils::Vector3i{{1, 1, 1}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere, do we really need a default value, considering we already provide a default value in the python class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added default value for the python class LatticeWalberla
. And blocks_per_mpi_rank
in LBFluid
is settle by get_value<Utils::Vector3i>(m_lattice->get_parameter("blocks_per_mpi_rank"))
.
if (blocks_per_mpi_rank != Utils::Vector3i{{1, 1, 1}}) { | ||
throw std::runtime_error( | ||
"GPU architecture PROHIBITED allocating many blocks to 1 CPU."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "GPU LB only uses 1 block per MPI rank"?
@@ -41,7 +41,7 @@ class LBMassCommon: | |||
|
|||
"""Check the lattice-Boltzmann mass conservation.""" | |||
|
|||
system = espressomd.System(box_l=[3.0, 3.0, 3.0]) | |||
system = espressomd.System(box_l=[6.0, 6.0, 6.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several mass conservation tests are know to take a lot of time, especially in code coverage builds. The runtime can significantly increase when multiple CI jobs run on the same runner, which is then starved of resources. This change may potentially increase the test runtime by a factor 8. Can you please confirm the runtime did not significantly change in the clang and coverage CI jobs, compared to the python branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the box_l from 6 to 4 to reduce the runtime. For testing with blocks_per_mpir_rank(i.e. [1,1,2]), box_l = 4 is needed.
Description of changes:
blocks_per_mpi_rank
forLBFluidWalberla