Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subpixel smoothing gradients #1780

Merged
merged 8 commits into from
Nov 17, 2021
Merged

Conversation

smartalecH
Copy link
Collaborator

@smartalecH smartalecH commented Oct 6, 2021

Closes #1649

TODO

  • Modify tests to include subpixel smoothing
  • Quantify the effect of smoothing filter on gradient accuracy
  • Quantify the effect of Aᵤ finite-difference step size on gradient accuracy
  • Sweep β and check results
  • Compare gradients to naive method

@smartalecH smartalecH marked this pull request as draft October 6, 2021 18:32
@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2021

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.53%. Comparing base (d4b1e1a) to head (d9a0826).
Report is 348 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1780      +/-   ##
==========================================
+ Coverage   73.52%   73.53%   +0.01%     
==========================================
  Files          17       17              
  Lines        4876     4879       +3     
==========================================
+ Hits         3585     3588       +3     
  Misses       1291     1291              
Files with missing lines Coverage Δ
python/adjoint/optimization_problem.py 56.77% <100.00%> (+0.22%) ⬆️
python/adjoint/utils.py 95.28% <100.00%> (+0.04%) ⬆️
python/adjoint/wrapper.py 98.50% <100.00%> (+0.02%) ⬆️
python/geom.py 94.16% <100.00%> (ø)

@smartalecH
Copy link
Collaborator Author

smartalecH commented Oct 6, 2021

After completely refactoring the way material grid gradients are computed, there is still an issue when subpixel smoothing is turned on. I plotted the relative error in the gradient abs((adjoint_grad - finite_diff_grad)/finite_diff_grad) as a function of β. As can be seen, the error accumulates with increasing β (for the case where subpixel smoothing is turned on):

beta_sweep

(note the non-smoothing case also sees a rise in relative error after a certain point because the actual gradient is going to zero, as expected)

Any ideas?

@smartalecH smartalecH requested a review from oskooi October 6, 2021 19:09
@smartalecH
Copy link
Collaborator Author

Okay, I think I see the (dominate) issue.

Before subpixel smoothing, were were sampling our forward and adjoint fields directly on the yee grid (bypassing the interpolation to voxel centers that typically takes place). During the recombination step, we would loop over these points directly one by one (e.g. each component individually) and calculate the gradient contribution of each component independent of any other component. This approach is nice because we only have to worry about one restriction step, and is perfectly valid if the material contains a perfectly diagonal permittivity tensor. This is because the field components are totally isolated by the dA/du inner product (which is just the tensor in our case).

Unfortunately, we cannot make this assumption when subpixel smoothing is enabled, as off-diagonal terms appear. We can't just use a simple weighted inner product on the relevant field components either, because the field components (and tensor components) are all stored on different parts of the Yee cell.

If we use all the field components located at the center of the voxel, we can do our weighted inner product. This of course requires a second interpolation step. So, in the backpropagation step, we need to perform two restriction steps: one that accounts for the u-grid to the permittivity grid, and one that accounts for the field components interpolating onto the voxel centers (these need to occur in reverse order). We also need to refactor a lot of the adjoint code that stores the fields, but that part seems pretty straightforward.

Here's a simple illustration:

image

Since backpropagation is a backward pass, we want to do the field restriction first, and then the u-grid restriction after that. The algorithm is actually rather simple:

LOOP_OVER_VOXEL_CENTERS(volume_describing_material_grid)
{
    u_scale =  E_a * dA_du * E_f; // at voxel center -- this scales all subsequent gradients (chain rule)
    LOOP_OVER_E_COMPONENTS_IN_VOXEL(v)
   { 
         //use the existing restriction code to propagate gradients to each u using postion v
   }
}

Notice I declared fictitious loop macros LOOP_OVER_VOXEL_CENTERS and LOOP_OVER_E_COMPONENTS_IN_VOXEL. I'm hoping we already have some macros that can at least do most if not all of the intended loop. If so, it will make the implementation very easy.

@oskooi
Copy link
Collaborator

oskooi commented Oct 7, 2021

Rather than start with the anisotropic case, why don't we debug and demonstrate second-order accuracy for a simpler test in which the effective ε tensors are diagonal by using e.g. a multilayer structure in a 2d cell in which the planar layers are oriented along the Cartesian directions? We would set β to an arbitrarily large number to ensure that the interfaces are discontinuous.

@smartalecH
Copy link
Collaborator Author

Unfortunately, due to the smoothing filter (which is a requirement for our approach) we will always have interfaces requiring anisotropic entries.

Here's a simple example: my design parameters form a rectangle. That rectangle must be smoothed and then projected. (If we don't smooth first, our taylor series approximation breaks down). The resulting geometry looks like this:

image

Now, if I pull the inverse permittivity for the Ex component in the Y direction, I get something like this (as expected):
image

where the non-zero entries correspond to the rounded corners of my rectangle.

The relative error of the gradient calculation for this example (β=1000) is 0.618. If we turn subpixel smoothing off in the design region, then the relative error becomes 8.49e-5.

Note that most of the gradient information is around the corners, where the off-diagonal terms appear.

@smartalecH
Copy link
Collaborator Author

I refactored the recombination step to include both restriction steps. The code is a lot simpler now, since it uses existing loop macros. However, the gradients (both with and without subpixel smoothing) are still off.

src/meepgeom.cpp Outdated Show resolved Hide resolved
@smartalecH
Copy link
Collaborator Author

Thanks for looking into this. Note that the forward and adjoint fields are no longer calculated on the Yee grid, but rather interpolated to the center. This way we can perform the full inner product (as described above).

So all components are on the center of the voxel. Of course, we have to restrict out to compensate for this.

@oskooi
Copy link
Collaborator

oskooi commented Oct 12, 2021

Note that the forward and adjoint fields are no longer calculated on the Yee grid, but rather interpolated to the center.

That means you need to be careful about how you interpolate the ε tensor components because each row is computed at a different Yee grid point. In the way that it is currently set up, the ε tensor is computed at the voxel center for each of Ex, Ey, Ez. I think the correct approach would be to interpolate the row of the ε tensor e.g. [εxx, εxy, εxz] from the same set of Yee grid points for Ex that you are using to interpolate the Ex field itself. Right now you are interpolating the fields to the voxel center but not the ε tensor.

@oskooi
Copy link
Collaborator

oskooi commented Oct 12, 2021

Unfortunately, due to the smoothing filter (which is a requirement for our approach) we will always have interfaces requiring anisotropic entries.

By planar multilayer structure in 2d, I was referring to something like this OLED. After applying filtering and subpixel smoothing, the ε tensor at the interface voxels should still be diagonal.

@smartalecH
Copy link
Collaborator Author

smartalecH commented Oct 12, 2021

I think the correct approach would be to interpolate the row of the ε tensor e.g. [εxx, εxy, εxz] from the same set of Yee grid points for Ex that you are using to interpolate the Ex field itself.

Yes, that is what I'm doing.

Right now you are interpolating the fields to the voxel center but not the ε tensor

No, all the permittivity components, in addition to the field components are being interpolated to the center voxel. That's the whole point of needing two restriction steps. (before this PR, that wasn't the case, as described above).

get_chi1_tensor() is a new function that calls eff_chi1inv_row() at the same centered grid point.

@oskooi
Copy link
Collaborator

oskooi commented Oct 18, 2021

For reference, posting results here for an investigation into the impact of subpixel smoothing of a MaterialGrid on a continuously varying geometry. The test case is identical to the ring-resonator example from the user manual which involves two overlapping Cylinder objects to represent the ring geometry. The study involves computing the frequency of a low-loss resonant mode while varying the inner radius of the ring. The scripts for the simulation and post-processing the results are here. The MaterialGrid is using β=1000 and η=0.5 for the projection operator of the level-set function.

For a MaterialGrid at a resolution of 1000 and Meep grid resolution of 10 (i.e., downsampling by a factor of 10), the results for subpixel smoothing are comparable to no smoothing: while the overall frequency response is continuous, there are still some discontinuities/jumps apparent in the data.

ring_matgrid_res10_desres1000_freq_vs_radius

When the Meep grid resolution is doubled to 20 , the continuous frequency response is more apparent. However, the same trend is observed for the no-smoothing case. This is perhaps not surprising since no-smoothing is using a bilinear interpolation of the grid weights onto the Yee grid.

ring_matgrid_res20_desres1000_freq_vs_radius

For this particular test case and setup, subpixel smoothing does not seem to be providing much of a benefit compared to no smoothing. Again, this is not unexpected because no-smoothing is using an interpolation scheme which mitigates stairstepping errors in the first place.

(One thing to try would be to increase the MaterialGrid resolution (to e.g. 2000 instead of 1000) in order to reduce errors in the downsampling and mitigate discontinuities for the case involving a Meep grid resolution of 10. Unfortunately the memory requirements become impractical: the 2d array used to store the grid weights is more than 20 Gb.)

@smartalecH
Copy link
Collaborator Author

I think a better test (based on our previous conversation) would be to set up a simple waveguide Fabry-Perot cavity, simulated in 2D, but defined by a 1D set of design variables. Then, identify the variable at an edge of the cavity, and sweep that from 0 to 1 while measuring the transmission (or reflection). When subpixel smoothing is off, the response should look like a step function centered at 0.5. When subpixel smoothing is on, it should be a continuous function. (make sure your β is really high).

@stevengj
Copy link
Collaborator

stevengj commented Oct 20, 2021

@oskooi, remember that our new material-grid subpixel-averaging algorithm relies on the material grid being a smooth level set function (on the scale of the pixels) that is subsequently projected (e.g. what you get in topology optimization from filtered density functions). It does not work on data that has already been binarized. (In particular, we assume that we can do a first-order Taylor expansion of the material-grid function—i.e., that it is approximately linear—on the scale of a Meep pixel.)

Because of this, it's not surprising that you are getting similar results with and without smoothing—the latter isn't actually doing much.

If you have a binarized image and you want to convert it to a smoother level set, there are multiple ways to go about it. One possibility might be to compute the signed distance function frpm the image (viewed as a discontinuous level-set function). In fact, there seem to be a lot of image-processing tools to generate a "signed distance field" from an image, so there might be some Python package we can point.

@smartalecH
Copy link
Collaborator Author

smartalecH commented Nov 12, 2021

For some reason, the python 3.9 w/MPI build is failing tests, but the python 3.6 w/MPI passes.

@ianwilliamson are you aware of any obvious reasons why this may be the case? Any major changes to round-off-sensitive operations that may be degrading the accuracy of the gradients?

While further debugging, I found that the problem only lies with the multi-frequency DFT adjoints (all the others are fine). Specifically, the discrepancy seems to occur while calculating the finite-difference gradient (i.e. not the adjoint gradient).

Here's an example output from my local (python 3.7 w/mpi @ 4 processes):

Directional derivative -- adjoint solver: [-4.94692863e-06 -3.02579578e-04 -2.97740672e-04], FD: [-3.56606049e-06 -3.02829576e-04 -2.99124494e-04]

And here's the output from the CI log for python 3.9 w/mpi:

Directional derivative -- adjoint solver: [-4.94780506e-06 -3.02579778e-04 -2.97741219e-04], FD: [ 2.49633706e-06 -2.91768610e-04 -3.12555079e-04]

Clearly something is off with the finite-difference calculation here...

@oskooi
Copy link
Collaborator

oskooi commented Nov 12, 2021

Note that the failure occurs only when compiling using single precision. Maybe you just need to increase the spacing you used to compute the finite difference gradients?

@smartalecH
Copy link
Collaborator Author

Maybe you just need to increase the spacing you used to compute the finite-difference gradients?

The finite differences I'm referring to are defined by the test (i.e. I shouldn't have to modify the test at all). The adjoint gradients don't appear to have any issues.

It's possible that even before this PR, the finite difference gradients were pretty far off, just not enough to kill the test. This PR does introduce a little bit of noise to the adjoint sensitivities, which might "knock" it out of the predefined tolerance (even though the bulk of the error was in the finite-difference calculation). I'll take a look at that possibility.

@ianwilliamson
Copy link
Contributor

Clearly something is off with the finite-difference calculation here...

It seems like the largest difference between the finite difference calculations is in that first element of the list. Even under Python 3.7, the difference between the adjoint calculation and the finite difference calculation is greater than 30% in that element.

@smartalecH
Copy link
Collaborator Author

I think the real issue here is the way we are referencing our finite-difference calculation as the ground truth. The adjoint gradients are consistent from single-precision, to double-precision, python 3.6 to python 3.9, etc. But the finite differences swing quite a bit.

This is somewhat expected, as our finite differences are single-sided (first-order accurate) and the step size is arbitrary. What we should do is some sort of Richardson extrapolation. But tests are passing now.

@oskooi
Copy link
Collaborator

oskooi commented Nov 16, 2021

Follow up to an earlier thread investigating the effect of subpixel smoothing on continuously shifting boundaries of a ring resonator represented as a MaterialGrid. Rather than use a binarized image, applying a Gaussian filter (not a signed-distance function as suggested by @stevengj) to smear the boundaries, the convergence results are not much different than before.

The inset schematic below labeled "filtered" shows the actual smoothed contour of the ring resonator adjacent to an image of the binarized representation. The "exact" result (shown in the plot as the black line) is computed at a resolution of 100 using no smoothing. What is unexpected in these results is that subpixel smoothing still does not produce a frequency response which varies continuously with continuous changes in the ring geometry.

ring_filter_vs_binarized

edit: the figure has been updated to correct the axis labels.

@smartalecH
Copy link
Collaborator Author

What filter radius are you using? Can you provide more details?

@oskooi
Copy link
Collaborator

oskooi commented Nov 16, 2021

What filter radius are you using? Can you provide more details?

The sigma value of ndimage.gaussian_filter used to generate these results is 1.0 (i.e., comparable to the voxel dimensions). The Meep grid resolution is 20 and the MaterialGrid resolution is 400. The actual code is here.

@smartalecH
Copy link
Collaborator Author

smartalecH commented Nov 16, 2021

There are a few nuances I should point out.

First, I don't think we are comparing the right things. Even when do_averaging is off, we still have a levelset, because the projection is happening after the linear interpolation. What we really want to compare is the new method to the "classical" technique, where projection happens before interpolation. This means beta=0 in the MaterialGrid object, and we use the tanh_projection() function as we do in the tutorials.

Second, I think your filter radius is too small. Remember, we have a first-order Taylor expansion representing our level-set. A radius of 1 pixel (which is a bit hard to quantify using a gaussian filter...) seems too small to ensure that approximation holds. I recommend you use the in-house conic filter (as the radius is very straightforward there) and try a few different lengths (e.g. 2 pixels, 4, 8, etc.). There should be a point where it reaches second-order convergence (if sweeping meep resolution). As we discussed before, the "pixel" unit we care about is w.r.t. the MaterialGrid, not the meep grid.

Third, we really should be fitting the underlying design parameters (for both our new method and the old method) to the final geometry you want at the end of the day (for a given filter radius and beta). Depending on your loss function, this can be cast as a convex problem (akin to logistic regression). But by doing the fitting, we ensure we really are comparing the right things.

Also, is the label on your x-axis right? Isn't your meep unit a micron? How are you specifying resolutions between 1 and 2 pixels per meep unit? I thought your meep resolution was fixed at 20, and you were instead sweeping the radius?

Finally, let's make sure we are clear about what we are trying to test with each example. Are we looking for second-order convergence? Are we making sure it behaves like a level-set?

@smartalecH
Copy link
Collaborator Author

The gradients seem to be accurate now. Note that the current implementation is a brute-force, inefficient approach using finite differences at each material grid point to compute Aᵤ. This introduces a bit more "noise" in the gradient calculation than before, but it's still much better than what was reported before #1801. As far as speed is concerned, the calculation is still plenty fast. Future PRs could try to implement the true, analytic vJp if needs be.

I ran some tests to check various aspects of this PR. All of the following tests use a FP-like structure that looks like this:
image

All of the tests can be found in this repo.

First, we can look at the gradient accuracy as β increases when do_averaging is true or false:
beta_sweep_new_30

The accuracy starts to "decrease" with β, but it's still within a workable range. Plus, it could be that the finite-difference "ground truth" measurements are also inaccurate. But compared to what's shown above, these results seem to indicate a successful implementation.

Next, I swept the finite difference step size used to estimate Aᵤ. As expected, we start to saturate after the step size reaches a certain point:
step_sweep

As mentioned here I dropped the default step size to 1e-3.

Next, I swept the filter radius (using the same underlying design parameters) at β=1e6:
radius_sweep

Note that the gradient accuracy doesn't depend on the filter radius. Rather, only the subpixel smoothing feature depends on the filter radius.

Finally, to (once again) illustrate the level-set behavior of our hybrid method, I measured the L2 norm of the gradient when using the classical density-based scheme and our new hybrid scheme:

beta_mag_sweep

As expected, the classical gradients go to zero as β increases, whereas the new hybrid method's gradients converge to a finite value. This effectively enables discrete shape optimization.

The tests are passing, and this PR should be ready for merging after review.

@smartalecH smartalecH marked this pull request as ready for review November 16, 2021 15:31
Copy link
Contributor

@ianwilliamson ianwilliamson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not many comments from me.

python/adjoint/utils.py Outdated Show resolved Hide resolved
src/meepgeom.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

subpixel smoothing gradients
5 participants