Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BugFix: memory error incollective effects circular buffers #771

Merged
merged 2 commits into from
May 23, 2024

Conversation

swhite2401
Copy link
Contributor

This PR fixes a bug in the collective effects implementation. Circular buffers are used to store the slice information and some parameters history. These buffers where shifted even when their size was equal to 1.
In the new implementation, when the size of these buffers is equal to 1 they are simply overwritten.

@swhite2401 swhite2401 added bug fix C For C code / pass methods labels May 22, 2024
@swhite2401 swhite2401 requested a review from lcarver May 22, 2024 12:46
@lcarver
Copy link
Contributor

lcarver commented May 23, 2024

Just for future reference, this bug is a bit strange. While debugging some mpi scripts sent in by some new users, we found a specific configuration of parameters that returned segmentation faults.

For example (I omit many lines of code here), if the following code:


comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

Nbunches = 1332
nparts_per_bunch = 666
Npart = int(Nbunches * nparts_per_bunch / size) #Computes the good number for MPI


add_beamloading(fring, qfactor, rshunt, Nslice=50, blmode=BLMode.PHASOR, cavitymode=CavityMode.ACTIVE, VoltGain=0.0001, PhaseGain=0.0001)

is launched with
mpirun -n 64 python file.py

We get a segmentation fault. Increasing the nparts_per_bunch solves the problem. Also changing Nslice from 50 to 51 solves the problem.

I have verified that the bug fixed proposed here solves the problem...but I'm not sure I fully understand the details.

@lcarver
Copy link
Contributor

lcarver commented May 23, 2024

In line 176 of atimplib.c there is the line

    weight[i] *= bunch_currents[ib]/np_bunch[ib];

which should be replaced by

    if (np_bunch[ib] == 0.0) {
        weight[i] = 0.0;
        }
    else {
        weight[i] *= bunch_currents[ib]/np_bunch[ib];
    }

@swhite2401
Copy link
Contributor Author

Just for future reference, this bug is a bit strange. While debugging some mpi scripts sent in by some new users, we found a specific configuration of parameters that returned segmentation faults.

For example (I omit many lines of code here), if the following code:


comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

Nbunches = 1332
nparts_per_bunch = 666
Npart = int(Nbunches * nparts_per_bunch / size) #Computes the good number for MPI


add_beamloading(fring, qfactor, rshunt, Nslice=50, blmode=BLMode.PHASOR, cavitymode=CavityMode.ACTIVE, VoltGain=0.0001, PhaseGain=0.0001)

is launched with mpirun -n 64 python file.py

We get a segmentation fault. Increasing the nparts_per_bunch solves the problem. Also changing Nslice from 50 to 51 solves the problem.

I have verified that the bug fixed proposed here solves the problem...but I'm not sure I fully understand the details.

You may take a look at this: https://en.wikipedia.org/wiki/Undefined_behavior

Copy link
Contributor

@lcarver lcarver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example files are running as normal and the bug is fixed

@swhite2401
Copy link
Contributor Author

I add @simoneliuzzo as reviewer because neither me or @lcarver can approve this PR

@swhite2401 swhite2401 merged commit 588edbc into master May 23, 2024
37 checks passed
@swhite2401 swhite2401 deleted the mpi_beamloading_bug branch May 23, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix C For C code / pass methods
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants