Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-materialized DFT field storage for streamlined adjoint calculations #1832

Closed
smartalecH opened this issue Nov 19, 2021 · 5 comments
Closed

Comments

@smartalecH
Copy link
Collaborator

smartalecH commented Nov 19, 2021

As discussed in #1797, we can alleviate several memory constraints while computing gradients during adjoint optimization by not materializing the DFT fields on every process. Instead, we can distribute the computation of the gradient itself. Here's what needs to happen/change in order for this to work.

  1. The DFT constructor needs to accept a flag that allows the DFT chunks to persist after a simulation object is garbage collected (i.e. after we run the forward run). We need to be aware of what fields in the DFT object cannot be accessed anymore (because they will point to deallocated objects). There might be some python garbage collection that happens to the dft_fields object too (thanks to SWIG), so we'll need to double-check that.
  2. We want to store DFT chunks with different field components, but the same grid_volume on the same process (this might already be the default). This will reduce a lot of communication during the recombination step (and simplify the code).
  3. The OptimizationProblem class will no longer need to materialize the DFT fields. Rather, they will just store a pointer to the DFT object itself.
  4. We need to modify the get_gradient routine to accept dft_fields pointers, rather than a pointer to the fields themselves.
  5. We need to modify the main gradient code to "loop in chunks". Each chunk will update a local copy of the gradients.
  6. Once all the local copies are done, we perform a simple sum_to_all.

Some gotchas

  • There's going to need to be some boundary communication along the edges of chunks in some cases. The logic of this will be a little hairy.
  • Symmetries are going to pose a small problem. It might be best, in this case, to only compute the simplest part of the gradient under the hood, and then return a full copy with the proper symmetries applied to the user (just like we do with everything else). For example, If we have a simulation with mirror symmetry, only the gradients for one half are computed during the loop_in_chunks routine. But we should probably support the use case of returning the full, symmetric array to the user (like the code currently operates).

(cc @oskooi)

@stevengj
Copy link
Collaborator

With regards to step 2, that is already true for the centered grid, I think. If the DFT chunks are on the Yee grid, then different field components aren't on the same grid points so it's not clear what you mean by the "same" grid volume. I guess you can't process the components separately because of anisotropic materials?

I think it might be possible to avoid boundary communications, but I think we need to sit down and go through exactly what is needed here.

With regard to symmetries, I think it would be cleanest to operate similarly to loop in chunks, looping over each chunk potentially multiple times depending on the symmetry operations that map it into the design region.

@stevengj
Copy link
Collaborator

stevengj commented Dec 1, 2021

In particular, to avoid boundary communication for anisotropic materials, probably the solution is to have a flag that allows the DFT chunks to be "padded" with redundant points around the edges (including not-owned) points as needed for the off-diagonal tensor communications.

@smartalecH
Copy link
Collaborator Author

@oskooi, I would outline things into 2 PRs:

  1. Add the required functionality to the DFT objects (add a flag for persistence to prevent garbage collection, and add proper functionality to store the "padded" pixels, which also requires a flag)
  2. Implement the new recombination step (steps 3-6).

As discussed, properly padding the DFT region (complicating the first PR) will eliminate the need to do any boundary communication (significantly simplifying the second PR).

@smartalecH
Copy link
Collaborator Author

  1. The DFT constructor needs to accept a flag that allows the DFT chunks to persist after a simulation object is garbage collected (i.e. after we run the forward run). We need to be aware of what fields in the DFT object cannot be accessed anymore (because they will point to deallocated objects). There might be some python garbage collection that happens to the dft_fields object too (thanks to SWIG), so we'll need to double-check that.

This step is a bit tricky. Naively, it would be great if we could just "not delete" these chunks and hang on to the pointers for post-processing during the recombination step. However, each dft chunk depends on its underlying fields chunk, which we have to delete...

An alternative solution might be to define a lightweight data structure that acts like the dft_chunk, but only stores the dft fields themselves. The downside to this approach is that we have to copy from the original dft_chunk to this new data structure.

@stevengj do you have any other ideas?

@stevengj
Copy link
Collaborator

stevengj commented Dec 10, 2021

However, each dft chunk depends on its underlying fields chunk, which we have to delete...

Can't we just set the fields_chunk *fc to NULL when the dft_chunk is disconnected?

And we can store the chunk_idx (as a new field in dft_chunk) to create a persistent reference to the fields_chunk (i.e. one which can be used with a new fields object that is re-created with the same chunk layout.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants