-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up step_boundaries #1710
Labels
Comments
ahoenselaar
pushed a commit
to ahoenselaar/meep
that referenced
this issue
Aug 6, 2021
* Replace `connections` with separate maps for incoming and outgoing connections. * Maintain separate connections for each chunk pair. These changes unblock some of the improvements disucssed on NanoComp#1710 and NanoComp#1670.
stevengj
pushed a commit
that referenced
this issue
Aug 11, 2021
* Replace `connections` with separate maps for incoming and outgoing connections. * Maintain separate connections for each chunk pair. These changes unblock some of the improvements disucssed on #1710 and #1670. Co-authored-by: Andreas Hoenselaar <[email protected]>
Would be good to have some profiling data to know what fraction of the time is typically spent on copying to/from these buffers, to know if it is worth optimizing further. |
bencbartlett
pushed a commit
to bencbartlett/meep
that referenced
this issue
Sep 9, 2021
* Replace `connections` with separate maps for incoming and outgoing connections. * Maintain separate connections for each chunk pair. These changes unblock some of the improvements disucssed on NanoComp#1710 and NanoComp#1670. Co-authored-by: Andreas Hoenselaar <[email protected]>
mawc2019
pushed a commit
to mawc2019/meep
that referenced
this issue
Nov 3, 2021
* Replace `connections` with separate maps for incoming and outgoing connections. * Maintain separate connections for each chunk pair. These changes unblock some of the improvements disucssed on NanoComp#1710 and NanoComp#1670. Co-authored-by: Andreas Hoenselaar <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
step_boundaries
copies all of the chunk's boundary points into a buffer, sends them via MPI (if the destination is a different process), and then copies the received buffer to the destination boundary. This code was all written many years ago under the assumption that the computational cost was not significant (since it is proportional to the surface area rather than the volume). However, in practice it appears to be a substantial portion of the time in some cases.Fortunately, the fact that it was never seriously optimized also means that there are probably many ways to speed it up. Here are three ideas:
The copy to/from buffer implementation uses an array of pointers to the source/destination locations, which means that there is a lot of pointer chasing in the inner loops. One simple optimization would be to (a) sort the pointers by address and (b) replace arithmetic sequences (sequences of
n
addresses with constant offsetss
) by a (pointer,s
,n
) triplet so that the sequence can be read using a simple loop.Similarly, the phase factors are stored and computed individually per boundary point here, when in fact many of these phases are the same, so they could be compressed with run-length encoding.
When the source and destination process are the same (e.g. copying from a chunk to a neighboring PML chunk on the same process), we can presumably skip the buffer and copy directly to the destination.
The text was updated successfully, but these errors were encountered: