Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-bounds indexing in material_grids_addgradient #1861

Closed
ahoenselaar opened this issue Dec 22, 2021 · 5 comments
Closed

Out-of-bounds indexing in material_grids_addgradient #1861

ahoenselaar opened this issue Dec 22, 2021 · 5 comments
Labels

Comments

@ahoenselaar
Copy link
Contributor

ahoenselaar commented Dec 22, 2021

Calculated indices fwd1_idx and fwd2_idx into the DFT field arrays are, at times, negative and cause out-of-bounds data to be read.

Repro:
Instrument material_grids_addgradient as follows:

...
fwd1_idx = get_idx_from_ivec(gv.dim, start_ivec, the_stride, fwd_p);
if (fwd1_idx < 0) { meep::abort("Negative index for first eps node fwd1_idx=%ld", fwd1_idx); }
fwd1 = 0.5 * meep::cdouble(GET_FIELDS(fields_f, ci_forward, f_i, fwd1_idx));
fwd2_idx = get_idx_from_ivec(gv.dim, start_ivec, the_stride, fwd_pf);
if (fwd2_idx < 0) { meep::abort("Negative index for first eps node fwd2_idx=%ld", fwd2_idx); }
fwd2 = 0.5 * meep::cdouble(GET_FIELDS(fields_f, ci_forward, f_i, fwd2_idx));
fwd_avg = fwd1 + fwd2;
meep::vec eps1 = gv[ieps1];
cyl_scale = (gv.dim == meep::Dcyl) ? eps1.r() : 1;
material_grids_addgradient_point(
    v+ng*f_i, vec_to_vector3(eps1), scalegrad*cyl_scale, geps,
    adjoint_c, forward_c, fwd_avg, 0.5*adj, frequencies[f_i], gv, du);

//operate on the second eps node
fwd1_idx = get_idx_from_ivec(gv.dim, start_ivec, the_stride, fwd_pa);
if (fwd1_idx < 0) { meep::abort("Negative index for second eps node fwd1_idx=%ld", fwd1_idx); }
fwd1 = 0.5 * meep::cdouble(GET_FIELDS(fields_f, ci_forward, f_i, fwd1_idx));
fwd2_idx = get_idx_from_ivec(gv.dim, start_ivec, the_stride, fwd_paf);
if (fwd2_idx < 0) { meep::abort("Negative index for second eps node fwd2_idx=%ld", fwd2_idx); }
...

then run the Python test suite and observe failures in adjoint tests.

Without the introduction of safer indexing primitives and better test coverage, bugs of this type are going to occur over and over.

@oskooi oskooi added the bug label Dec 22, 2021
@smartalecH
Copy link
Collaborator

smartalecH commented Dec 22, 2021

What's an example of a "safer indexing primitive" we can use to avoid issues like this in the future?

@ahoenselaar
Copy link
Contributor Author

Having a type for n-dimensional arrays and views into arrays with methods to index into the array would be a starting point. In debug builds, the indexing methods can perform bounds checking. This would reduce the amount of indexing and stride calculation logic that is distributed over the codebase.
Take the fields_a argument of material_grids_addgradient as an example. Currently, a raw 1D array is passed along with another array that contains shape data when the actual data type of fields_a is "three 4D arrays". As a consequence, significant chunks of material_grids_addgradient concern themselves with indexing and stride calculations, when they should be expressing the core of the gradient calculation logic. This complexity makes the code error-prone and extremely difficult to understand even for the interested reader. Note that the buggy sections of material_grids_addgradient sailed through code review by highly experienced Meep contributors.
Macros used to access data like GET_FIELDS should be removed and replaced with testable constructs.

@smartalecH
Copy link
Collaborator

This is great -- thanks, Andreas.

I'll also add that not only would these new data structures prevent bugs, but they would significantly simplify the arduous process of adding new features to the C++ backend. For this particular PR that you reference, I spent 80% of my time trying do nail down the complicated array logic (and obviously it still has issues...).

Also note that #1855 completely revamps how these fields are accessed, so let's not dwell too much on "fixing" this particular issue (rather let's just get the new PR working correctly from the very beginning).

In addition to the above suggestion, I imagine you have a series of techniques you use to identify and diagnose issues like these? It might be nice to someday document these techniques for future use. Im sure these are standard "tricks of the trade" used across software development, but many of us aren't traditional software devs.

@ahoenselaar
Copy link
Contributor Author

When it comes to techniques, one line of defense is the use of address sanitizers. Those were introduced as a GitHub Action in #1610 but unfortunately disabled, likely due to the large number of identified issues.

Here is the output for ASAN at the current state in the repo: https://gist.github.com/ahoenselaar/13d0b764bb38bbab8b579ed414c49592

At the moment, the sanitizer only covers C++ tests and thus cannot identify the OOB indexing in gradient-related methods. IIRC, JAX was causing issues in the ASAN-enabled build but those could probably be resolved.

@smartalecH
Copy link
Collaborator

Closed by #1886?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants