-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168
Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168
Conversation
Just thought you might be interested that I tried the vector interpolation scheme with |
Nice Addition @odlomax ! Could we try to split the PR in two, one for the arrayForEach and one for the adjoint? We can merge the arrayForEach one first, and rebase this one after. |
Absolutely. Watch this space... |
Other PR #171 is merged. Please rebase :) |
Removed unnecessary headers.
🥳 @odlomax |
Great success! 🤩 |
No more pushes from me for now. I'll tidy up the macros when it's reviewed. 🙂 |
#warning Disabling OpenMP to prevent internal compiler error for intel-classic version < 2021.6 (intel-oneapi/2022.2) | ||
#undef atlas_omp_parallel_for | ||
#define atlas_omp_parallel_for for | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come you could drop this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem arose when supplying a template parameter functor to loop body. I've since separated out the use-cases into their own loops, as @pmaciel suggested. No longer triggers the error. 🙂
src/atlas/interpolation/method/sphericalvector/ComplexMatrixMultiply.h
Outdated
Show resolved
Hide resolved
I now played a bit with nvhpc/22.11 myself now in the routine atlas_omp_parallel_for( ... ) {
...
if constexpr (Rank == 3) {
for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
const auto targetVector =
complexWeight * Complex(sourceSlice(jlev,0), sourceSlice(jlev,1));
targetSlice(jlev,0) += targetVector.real();
targetSlice(jlev,1) += targetVector.imag();
targetSlice(jlev,2) += realWeight * sourceSlice(jlev,2);
}
}
if constexpr (Rank == 2) {
...
} What however doesn't work is creating a lambda that captures realWeight,complexWeight to do the inner part as: atlas_omp_parallel_for( ... ) {
...
auto matmul = [realWeight,complexWeight](auto&& sourceElem, auto&& targetElem) {
const auto targetVector =
complexWeight * Complex(sourceElem(0), sourceElem(1));
targetElem(0) += targetVector.real();
targetElem(1) += targetVector.imag();
targetElem(2) += realWeight * sourceElem(2);
};
if constexpr (Rank == 3) {
for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
auto sourceElem = sourceSlice.slice(jlev,array::Range::all());
auto targetElem = targetSlice.slice(jlev,array::Range::all());
matmul(sourceElem,targetElem);
}
}
if constexpr (Rank == 2) {
matmul(sourceSlice,targetSlice);
} If instead of "capturing" atlas_omp_parallel_for( ... ) {
...
auto matmul = [](auto&& sourceElem, auto&& targetElem, auto realWeight, auto complexWeight) {
const auto targetVector =
complexWeight * Complex(sourceElem(0), sourceElem(1));
targetElem(0) += targetVector.real();
targetElem(1) += targetVector.imag();
targetElem(2) += realWeight * sourceElem(2);
};
if constexpr (Rank == 3) {
for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
auto sourceElem = sourceSlice.slice(jlev,array::Range::all());
auto targetElem = targetSlice.slice(jlev,array::Range::all());
matmul(sourceElem, targetElem, realWeight, complexWeight);
}
}
if constexpr (Rank == 2) {
matmul(sourceSlice, targetSlice, realWeight, complexWeight);
} So it appears the issue is with the lambda capture in the OpenMP region. |
I need to give up on templates. They're nothing but trouble! 😅 |
Sometimes it's really not worth the effort (probably many many many hours) to cater for added generality or save some lines of code. With |
I'm glad concepts in C++20 has killed SFINAE for good. I think my problem is that lambda expressions have become my golden hammer, and now every problem looks like a nail! |
Thanks @odlomax really it's not your fault but nvidia compiler's fault. Thanks for working around! ❤️ |
Honestly, it's good life experience! We're going to have nvhpc on our new system, so it's useful to know its peculiarities! |
Looks good to me now. Do you mind if I "Squash-and-Merge" ? |
Wonderful! Squash away! |
* release/0.37.0: (23 commits) Update Changelog Version 0.37.0 Projection base implementation derivatives performance/encapsulation … (#185) atlas_io is an adaptor library when eckit_codec is available (#181) Fix build for configuration setting ATLAS_BITS_LOCAL=64 (#184) Revert "Avoid linker warnings on macOS about 'ld: warning: could not create compact unwind for ...'" Cosmetic: readability braces Initialize std::array values to zero because valgrind complains, even though c++ standard mandates it should be default-initialized to zero Fix bug in TraceT caused by typo where the title was wrong Avoid linker warnings on macOS about 'ld: warning: could not create compact unwind for ...' Use new LocalConfiguration baseclass functions in util::Config and util::Metadata instead of eckit::Value backdoor Removed leftover code missed in PR #175 Update `SphericalVector` to work with StructuredColumns as source functionspace. (#175) Bugfix for regional grids with ny > nx Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. (#168) Added test with empty integer sequence. Added arrayForEachDim method. Add docs build workflow Github Actions: Fix macOS MPI slots Fix for elements that might have unassigned partition via parallel Delaunay meshgenerator ...
Following on from PR #163, this PR primarily adds and tests the adjoint methods for the
interpolation::method::SphericalVector
class.Several refactors were also applied to reduce complexity and code duplication:
SphericalVector.cc
and given it's own classComplexMatrixMultiply
eckit::linalg::SparseMatrix
with aValue
template. This also confines all the nasty macros to one place.Types.h
was added, containing all the main type definitions required for theSphericalVector
class.ArrayForEachDim
was added toarray::helpers::ArrayForEach
. This allows the iteration dimensions to be set using anstd::integer_sequence
.Further testing was added
ArrayForEachDim
inArrayForEach
tests.StructuredColumns
FixupHaloForVectors
methods interferes withSphericalVector
. You must make sure a halo exchange isn't called on the source field for it to work.Notes:
Apologies for the strange commit history, I was trying to debug the MacOS build without a Mac.
Closes #166
Figures
O48 Structured Columns to Cubed Sphere interpolation error when using
SphericalVector
to treat vector fields (same setup as PR #163).O48 Structured Columns to Cubed Sphere interpolation error when using
FixupHaloForVectors
to treat vector fields.