Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up step_boundaries #1710

Open
stevengj opened this issue Jul 28, 2021 · 1 comment
Open

speed up step_boundaries #1710

stevengj opened this issue Jul 28, 2021 · 1 comment

Comments

@stevengj
Copy link
Collaborator

stevengj commented Jul 28, 2021

step_boundaries copies all of the chunk's boundary points into a buffer, sends them via MPI (if the destination is a different process), and then copies the received buffer to the destination boundary. This code was all written many years ago under the assumption that the computational cost was not significant (since it is proportional to the surface area rather than the volume). However, in practice it appears to be a substantial portion of the time in some cases.

Fortunately, the fact that it was never seriously optimized also means that there are probably many ways to speed it up. Here are three ideas:

  1. The copy to/from buffer implementation uses an array of pointers to the source/destination locations, which means that there is a lot of pointer chasing in the inner loops. One simple optimization would be to (a) sort the pointers by address and (b) replace arithmetic sequences (sequences of n addresses with constant offsets s) by a (pointer,s,n) triplet so that the sequence can be read using a simple loop.

  2. Similarly, the phase factors are stored and computed individually per boundary point here, when in fact many of these phases are the same, so they could be compressed with run-length encoding.

  3. When the source and destination process are the same (e.g. copying from a chunk to a neighboring PML chunk on the same process), we can presumably skip the buffer and copy directly to the destination.

ahoenselaar pushed a commit to ahoenselaar/meep that referenced this issue Aug 6, 2021
* Replace `connections` with separate maps for incoming and outgoing connections.
* Maintain separate connections for each chunk pair.

These changes unblock some of the improvements disucssed on NanoComp#1710 and NanoComp#1670.
stevengj pushed a commit that referenced this issue Aug 11, 2021
* Replace `connections` with separate maps for incoming and outgoing connections.
* Maintain separate connections for each chunk pair.

These changes unblock some of the improvements disucssed on #1710 and #1670.

Co-authored-by: Andreas Hoenselaar <[email protected]>
@stevengj
Copy link
Collaborator Author

Would be good to have some profiling data to know what fraction of the time is typically spent on copying to/from these buffers, to know if it is worth optimizing further.

bencbartlett pushed a commit to bencbartlett/meep that referenced this issue Sep 9, 2021
* Replace `connections` with separate maps for incoming and outgoing connections.
* Maintain separate connections for each chunk pair.

These changes unblock some of the improvements disucssed on NanoComp#1710 and NanoComp#1670.

Co-authored-by: Andreas Hoenselaar <[email protected]>
mawc2019 pushed a commit to mawc2019/meep that referenced this issue Nov 3, 2021
* Replace `connections` with separate maps for incoming and outgoing connections.
* Maintain separate connections for each chunk pair.

These changes unblock some of the improvements disucssed on NanoComp#1710 and NanoComp#1670.

Co-authored-by: Andreas Hoenselaar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant