-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable single-precision floating point for DFT fields arrays #1675
Conversation
The factor of 12 seems hard to believe. One hypothesis is that you are getting lucky and single precision is just fitting into cache — an easy way to check this would be to double the number of frequencies. |
We should rerun the performance comparisons with monitors on the Yee grid. |
I think it is fine to just change the performance-critical arrays here (the ones that are updated on every timestep). |
Additional changes required here: https://github.com/NanoComp/meep/blob/master/python/meep.i#L421 |
Contrary to the earlier suggestion by @ahoenselaar, no changes to This means that this PR can probably be merged as-is without any additional changes.
|
Ah yes! The conversion from |
That's correct. |
Following the suggestion from @ahoenselaar, I reran the benchmarking results using the DFT fields with As additional verification, I reran the original benchmarking test with the DFT flux reported in the initial comment. This time I was only able to demonstrate an expected speedup of ~2X for single precision and not ~10X as initially reported which is reassuring. This is because in my original comment I was comparing the single-precision results from this branch to the master branch compiled with |
…oved performance using clang
LGTM, thanks. |
…p#1675) * enable single-precision floating point for DFT fields arrays * update docs * update benchmarking results in docs * use modified DFT field update for real time-domain fields due to improved performance using clang
…p#1675) * enable single-precision floating point for DFT fields arrays * update docs * update benchmarking results in docs * use modified DFT field update for real time-domain fields due to improved performance using clang
#1544 enabled single-precision floating point for the time-domain fields. However, that PR did not change the DFT fields which are always stored using double precision. The DFT field updates is often the performance bottleneck in the timestepping for the adjoint solver due to the fact that the entire design region is a DFT fields monitor typically with a fine frequency mesh (i.e., a large number of spatial and frequency points which need to be updated at every timestep).
In order to reduce the memory bandwidth even further than what was enabled by #1544, this PR modifies the default type of the DFT fields arrays to switch to single precision when compiling with the
--enable-single
flag. This PR only modifies the DFT field updates indft.cpp
and leaves other functions which use the DFT fields (process_dft_component
,get_dft_arrays
, etc.) unchanged because they are not performance critical. When running the test suite viamake check
, 6/19 of the C++ unit tests are failing due to slight differences in the hard-coded values which is expected.The performance improvement enabled by this PR for a benchmarking test involving an OLED device with multiple DFT monitors is significant (see this gist showing simulation script and results). The time spent on the DFT field updates was
reduced by more than a factor of 12nearly halved when switching from double to single precision practically without any loss in the accuracy of the flux values.