Removed separation of MPI ranks during boozXform #442

mishapadidar · 2024-08-13T18:41:53Z

Currently, only MPI leaders run the majority of the run method of the BoozXform. This would be fine, except there is no use of MPI broadcast to send the all results of the run to the MPI workers. For example, on line 168 the MPI leaders will set self.bx.asym = bool(wout.lasym) (a boolean for stellarator symmetry), whereas the MPI workers will still be stuck with the default value of self.bx.asym since they never run this line. This can cause downstream problems in codes that depend on these properties, such as the BoozerRadialInterpolant class.

This PR proposes to allow all MPI cores to run the entire function, rather than just the leaders.

Downsides

Our cores are technically doing more work. Nonetheless, this should not affect run time.

Upsides

Ensures correctness of the result.

Alternatives

Only have the leaders run the function, and then broadcast the results. This also would work, but then introduces some minor complexity in that we have to keep track of what should get broadcasted and what doesn't need to.

Minimal working example
The error can be seen by running this code with two MPI ranks, i.e. mpiexec -n 2 python test.py. The error will be caused by using a single MPI partition. In my case, i find that the error is not deterministic, i.e. you may have to run this code multiple times to find that it fails. This is because, the default value of self.booz.bx.asym is not deterministic, switching between 0 and 1.

from simsopt.mhd import Vmec
from simsopt.util import MpiPartition
from boozermagneticfield import BoozerRadialInterpolant

from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()

# initial configuration
vmec_input = "../../../examples/2_Intermediate/inputs/input.nfp4_QH_warm_start"

mpi = MpiPartition(1)
vmec = Vmec(vmec_input, mpi=mpi, keep_all_files=False, verbose=False)
nfp=4

# Construct radial interpolant of magnetic field
bri = BoozerRadialInterpolant(
    equil=vmec, order=3, mpol=4, ntor=4, enforce_vacuum=True
)
print(f'rank {rank}) stellarator symmetry', bri.stellsym)
comm.Barrier()

Expected output when the script fails (rank 1 gets stuck in the boozerRadialInterpolant)

rank 0) stellarator symmetry True

Expected output when the script succeeds,

rank 0) stellarator symmetry True
rank 1) stellarator symmetry True

removed separation of MPI ranks during boozXform

1def0e4

landreman self-requested a review August 27, 2024 13:15

landreman approved these changes Aug 27, 2024

View reviewed changes

landreman merged commit 3362805 into hiddenSymmetries:master Aug 27, 2024
33 of 35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removed separation of MPI ranks during boozXform #442

Removed separation of MPI ranks during boozXform #442

mishapadidar commented Aug 13, 2024

Removed separation of MPI ranks during boozXform #442

Removed separation of MPI ranks during boozXform #442

Conversation

mishapadidar commented Aug 13, 2024