Enable single-precision floating point for DFT fields arrays #1675

oskooi · 2021-07-15T05:34:45Z

#1544 enabled single-precision floating point for the time-domain fields. However, that PR did not change the DFT fields which are always stored using double precision. The DFT field updates is often the performance bottleneck in the timestepping for the adjoint solver due to the fact that the entire design region is a DFT fields monitor typically with a fine frequency mesh (i.e., a large number of spatial and frequency points which need to be updated at every timestep).

In order to reduce the memory bandwidth even further than what was enabled by #1544, this PR modifies the default type of the DFT fields arrays to switch to single precision when compiling with the --enable-single flag. This PR only modifies the DFT field updates in dft.cpp and leaves other functions which use the DFT fields (process_dft_component, get_dft_arrays, etc.) unchanged because they are not performance critical. When running the test suite via make check, 6/19 of the C++ unit tests are failing due to slight differences in the hard-coded values which is expected.

The performance improvement enabled by this PR for a benchmarking test involving an OLED device with multiple DFT monitors is significant (see this gist showing simulation script and results). The time spent on the DFT field updates was reduced by more than a factor of 12 nearly halved when switching from double to single precision practically without any loss in the accuracy of the flux values.

stevengj · 2021-07-15T19:17:58Z

The factor of 12 seems hard to believe. One hypothesis is that you are getting lucky and single precision is just fitting into cache — an easy way to check this would be to double the number of frequencies.

ahoenselaar · 2021-07-15T19:18:56Z

We should rerun the performance comparisons with monitors on the Yee grid.

stevengj · 2021-07-15T19:20:05Z

I think it is fine to just change the performance-critical arrays here (the ones that are updated on every timestep).

ahoenselaar · 2021-07-15T19:20:08Z

Additional changes required here: https://github.com/NanoComp/meep/blob/master/python/meep.i#L421

oskooi · 2021-07-16T15:48:58Z

Contrary to the earlier suggestion by @ahoenselaar, no changes to get_dft_array, etc. seem to be required in this PR because nothing is broken by these changes. There are five Python tests which call the get_dft_array function: test_adjoint_solver.py, test_array_metadata.py, test_dft_fields.py, test_gaussian_beam.py, test_n2f_periodic.py. Three of these tests (test_dft_fields.py, test_gaussian_beam.py, test_n2f_periodic.py) pass using this branch compiled with --enable-single. The two failing tests (test_adjoint_solver.py test_array_metadata.py) are due to slight numerical differences in the fields and not to anything related to get_dft_array. In fact, the failing 21/49 Python tests (list shown below) are all due to slight numerical differences in the fields similar to the failing C++ tests.

This means that this PR can probably be merged as-is without any additional changes.

FAIL: tests/test_3rd_harm_1d.py
PASS: tests/test_absorber_1d.py
FAIL: tests/test_adjoint_solver.py
FAIL: tests/test_adjoint_jax.py
PASS: tests/test_antenna_radiation.py
PASS: tests/test_array_metadata.py
PASS: tests/test_bend_flux.py
PASS: tests/test_binary_grating.py
FAIL: tests/test_cavity_arrayslice.py
FAIL: tests/test_cavity_farfield.py
PASS: tests/test_chunk_layout.py
FAIL: tests/test_chunks.py
PASS: tests/test_cyl_ellipsoid.py
PASS: tests/test_dft_energy.py
PASS: tests/test_dft_fields.py
PASS: tests/test_diffracted_planewave.py
FAIL: tests/test_dispersive_eigenmode.py
PASS: tests/test_divide_mpi_processes.py
FAIL: tests/test_eigfreq.py
PASS: tests/test_faraday_rotation.py
FAIL: tests/test_field_functions.py
PASS: tests/test_force.py
PASS: tests/test_fragment_stats.py
PASS: tests/test_gaussianbeam.py
PASS: tests/test_geom.py
FAIL: tests/test_get_point.py
FAIL: tests/test_holey_wvg_bands.py
FAIL: tests/test_holey_wvg_cavity.py
PASS: tests/test_kdom.py
PASS: tests/test_ldos.py
PASS: tests/test_material_grid.py
PASS: tests/test_medium_evaluations.py
FAIL: tests/test_mode_coeffs.py
PASS: tests/test_mode_decomposition.py
FAIL: tests/test_multilevel_atom.py
PASS: tests/test_n2f_periodic.py
PASS: tests/test_oblique_source.py
PASS: tests/test_physical.py
PASS: tests/test_prism.py
FAIL: tests/test_pw_source.py
FAIL: tests/test_refl_angular.py
FAIL: tests/test_ring.py
FAIL: tests/test_ring_cyl.py
FAIL: tests/test_simulation.py
PASS: tests/test_special_kz.py
PASS: tests/test_source.py
FAIL: tests/test_user_defined_material.py
PASS: tests/test_visualization.py
FAIL: tests/test_wvg_src.py
============================================================================
Testsuite summary for meep 1.20.0-beta
============================================================================
# TOTAL: 49
# PASS:  28
# SKIP:  0
# XFAIL: 0
# FAIL:  21
# XPASS: 0
# ERROR: 0

ahoenselaar · 2021-07-16T16:27:05Z

Ah yes! The conversion from realnum to double occurs in line 820 in dft.cpp, before any of the routines in the SWIG wrapper get exposure to it.

oskooi · 2021-07-16T16:33:14Z

Ah yes! The conversion from realnum to double occurs in line 820 in dft.cpp, before any of the routines in the SWIG wrapper get exposure to it.

That's correct. get_dft_array therefore always returns its result as double-precision floating point regardless of the type of the actual DFT fields. The key point is that this is not performance critical as get_dft_array is typically not called at every timestep. It would be good to fix this at some point but this can be addressed in a separate PR.

src/dft.cpp

oskooi · 2021-07-17T06:08:34Z

Following the suggestion from @ahoenselaar, I reran the benchmarking results using the DFT fields with yee_grid=True (rather than DFT flux) with gcc and clang using the same single-core Intel Kaby Lake 4.2 GHz. For this test configuration (see gist), the time spent on the DFT fields updates for the single-precision floating point was nearly half that of double precision as expected. (single: 0.0140958 ± 0.0003646 s, double: 0.0237407 ± 0.0006847 s) The results were similar with yee_grid=False and also independent of the choice of compiler. I have updated the documentation with these results.

As additional verification, I reran the original benchmarking test with the DFT flux reported in the initial comment. This time I was only able to demonstrate an expected speedup of ~2X for single precision and not ~10X as initially reported which is reassuring. This is because in my original comment I was comparing the single-precision results from this branch to the master branch compiled with --enable-debug which was turning off the optimization and therefore producing much slower results by comparison.

…oved performance using clang

stevengj · 2021-07-19T20:29:04Z

LGTM, thanks.

…p#1675) * enable single-precision floating point for DFT fields arrays * update docs * update benchmarking results in docs * use modified DFT field update for real time-domain fields due to improved performance using clang

oskooi added 2 commits July 14, 2021 21:38

enable single-precision floating point for DFT fields arrays

a3ade7c

update docs

b6a4494

oskooi added the enhancement label Jul 15, 2021

ahoenselaar reviewed Jul 16, 2021

View reviewed changes

src/dft.cpp Outdated Show resolved Hide resolved

update benchmarking results in docs

5ac7baf

use modified DFT field update for real time-domain fields due to impr…

e93cc35

…oved performance using clang

stevengj merged commit b5f6cb7 into NanoComp:master Jul 19, 2021

oskooi deleted the dft_realnum branch July 19, 2021 20:29

oskooi mentioned this pull request Jul 25, 2021

single-precision floating point for the fields arrays exacerbates chaotic behavior for Maxwell-Bloch simulation? #1700

Open

oskooi mentioned this pull request Nov 20, 2021

support for single-precision floating point for fields array functions #1833

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable single-precision floating point for DFT fields arrays #1675

Enable single-precision floating point for DFT fields arrays #1675

oskooi commented Jul 15, 2021 •

edited

Loading

stevengj commented Jul 15, 2021

ahoenselaar commented Jul 15, 2021

stevengj commented Jul 15, 2021

ahoenselaar commented Jul 15, 2021

oskooi commented Jul 16, 2021

ahoenselaar commented Jul 16, 2021

oskooi commented Jul 16, 2021 •

edited

Loading

oskooi commented Jul 17, 2021 •

edited

Loading

stevengj commented Jul 19, 2021

Enable single-precision floating point for DFT fields arrays #1675

Enable single-precision floating point for DFT fields arrays #1675

Conversation

oskooi commented Jul 15, 2021 • edited Loading

stevengj commented Jul 15, 2021

ahoenselaar commented Jul 15, 2021

stevengj commented Jul 15, 2021

ahoenselaar commented Jul 15, 2021

oskooi commented Jul 16, 2021

ahoenselaar commented Jul 16, 2021

oskooi commented Jul 16, 2021 • edited Loading

oskooi commented Jul 17, 2021 • edited Loading

stevengj commented Jul 19, 2021

oskooi commented Jul 15, 2021 •

edited

Loading

oskooi commented Jul 16, 2021 •

edited

Loading

oskooi commented Jul 17, 2021 •

edited

Loading