-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel libraries #1750
Comments
For a significant speedup in the timestepping rate when using the adjoint solver for inverse design and topology optimization, you can try the three new features added in Meep 1.20: (1) single-precision floating point for the fields arrays, (2) decimation of the DFT field updates, and (3) memory locality for the step-curl updates via loop tiling. Since the Meep Conda package for the 1.20 release is built using double-precision floating point, to use (1) you will need to compile Meep from source using the Separately, it would still be useful to investigate the performance impact of the various MPI libraries for single and multi-node MPI clusters. |
I spent significant time doing this and didn't see any noticeable gain in performance.
Maybe try checkpointing your optimizations so you can run where you left off using multiple job submissions.
This isn't quite ready for the adjoint simulation. |
Hi, |
How far did you go with the number of nodes and processes? from the data here aws/aws-parallelcluster#1436, a possible speedup looks growing with the number of processes involved. Consider that I'll try running on up to ~500 nodes for my need... Could some trial and error on the possible C flags to specify during compiling be worth, in your opinion?
Oh thank you very much! I didn't know meep also allowed for checkpointing easily! That's great! Can I do that also in the context of adjoint-method optimizations without issues?
May I ask you why? In any case is some kind of automatic decimation already active by default? (I don't remeber, I may have just read it somewhere here); should I disable it in such a case, for adjoint-method optimiztions? |
#1628 recently added support for multithreading via OpenMP for the fields update. You can therefore (in addition to
The optimal decimation factor which takes into account the band-limited nature of your (pulsed) source and monitor bandwidth is chosen for you automatically by default (#1732) except if you are using the adjoint solver (#1751). In that case and until #1751 is resolved, you will need to manually set the
The loop tiling feature is part of the Meep 1.20 release. It is disabled by default. To use it, try setting |
Hi, I am giving a look into this; is checkpointing somehow build-in or should I trying setting it up? In case do you have suggestions on how to proceed in the case of an adjoint optimization? Maybe I should post this here https://github.com/stevengj/nlopt/issues ? Thanks |
No that's something you'll have to set up yourself. |
I see; I'll give a try; do you foresee any possible issues in implementing it in the context of the adjoint optimization with nlopt? |
Issue seems resolved by this comment. |
Hello there,
So far I did some timing tests on my meep topology optimization code, using just 4 nodes with the conda-package-included mpich implementation of MPI. So far my predicted timing to get enough iterations is well beyond the maximum allowed walltime-per-job on my machine; I noticed around that IntelMPI implementation of MPI could offer some good speedup over many nodes; has anyone ever tried to source-compile pymeep with it? In case which is the newest version you could have meep apparently working with?
The text was updated successfully, but these errors were encountered: