Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168

Merged

Conversation

odlomax
Copy link
Contributor

@odlomax odlomax commented Jan 8, 2024

Following on from PR #163, this PR primarily adds and tests the adjoint methods for the interpolation::method::SphericalVector class.

Several refactors were also applied to reduce complexity and code duplication:

  • The matrix multiplication code was removed from SphericalVector.cc and given it's own class ComplexMatrixMultiply
  • A SparseMatrix class was added that wraps the Eigen Sparse matrix into something that looks more like eckit::linalg::SparseMatrix with a Value template. This also confines all the nasty macros to one place.
  • Types.h was added, containing all the main type definitions required for the SphericalVector class.
  • ArrayForEachDim was added to array::helpers::ArrayForEach. This allows the iteration dimensions to be set using an std::integer_sequence.

Further testing was added

  • Test added for ArrayForEachDim in ArrayForEach tests.
  • Adjoint methods added to all interpolation tests.
  • StructuredColumns to CubedSphere interpolation test added. Figures showing performance are given below. Note: StructuredColumns FixupHaloForVectors methods interferes with SphericalVector. You must make sure a halo exchange isn't called on the source field for it to work.

Notes:
Apologies for the strange commit history, I was trying to debug the MacOS build without a Mac.
Closes #166

Figures

O48 Structured Columns to Cubed Sphere interpolation error when using SphericalVector to treat vector fields (same setup as PR #163).

sc_to_cs_spherical

O48 Structured Columns to Cubed Sphere interpolation error when using FixupHaloForVectors to treat vector fields.
sc_to_cs_spherical_raw

@odlomax
Copy link
Contributor Author

odlomax commented Jan 12, 2024

@wdeconinck @pmaciel

Just thought you might be interested that I tried the vector interpolation scheme with StructuredColumns (see above). Also did some tests with double and half the source functionspace resolution. Reassuringly, there's quadratic convergence on the analytic solution. 😀

@wdeconinck
Copy link
Member

Nice Addition @odlomax !

Could we try to split the PR in two, one for the arrayForEach and one for the adjoint?

We can merge the arrayForEach one first, and rebase this one after.

@odlomax
Copy link
Contributor Author

odlomax commented Jan 25, 2024

Nice Addition @odlomax !

Could we try to split the PR in two, one for the arrayForEach and one for the adjoint?

We can merge the arrayForEach one first, and rebase this one after.

Absolutely. Watch this space...

@odlomax
Copy link
Contributor Author

odlomax commented Jan 25, 2024

Nice Addition @odlomax !
Could we try to split the PR in two, one for the arrayForEach and one for the adjoint?
We can merge the arrayForEach one first, and rebase this one after.

Absolutely. Watch this space...

Done. Here's the PR.

@wdeconinck
Copy link
Member

Other PR #171 is merged. Please rebase :)

@wdeconinck
Copy link
Member

🥳 @odlomax

@odlomax
Copy link
Contributor Author

odlomax commented Feb 14, 2024

🥳 @odlomax

Great success! 🤩

@odlomax
Copy link
Contributor Author

odlomax commented Feb 15, 2024

No more pushes from me for now. I'll tidy up the macros when it's reviewed. 🙂

#warning Disabling OpenMP to prevent internal compiler error for intel-classic version < 2021.6 (intel-oneapi/2022.2)
#undef atlas_omp_parallel_for
#define atlas_omp_parallel_for for
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you could drop this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem arose when supplying a template parameter functor to loop body. I've since separated out the use-cases into their own loops, as @pmaciel suggested. No longer triggers the error. 🙂

@wdeconinck
Copy link
Member

I now played a bit with nvhpc/22.11 myself now in the routine ComplexMatrixMultiply::applyThreeVector
I tried simplifying until it worked, without using the arrayForeachDim construct. Following works:

atlas_omp_parallel_for( ... ) {
    ...
    if constexpr (Rank == 3) {
        for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
            const auto targetVector =
                complexWeight * Complex(sourceSlice(jlev,0), sourceSlice(jlev,1));

            targetSlice(jlev,0) += targetVector.real();
            targetSlice(jlev,1) += targetVector.imag();
            targetSlice(jlev,2) += realWeight * sourceSlice(jlev,2);
        }
    }
    if constexpr (Rank == 2) {
        ...
    }

What however doesn't work is creating a lambda that captures realWeight,complexWeight to do the inner part as:

atlas_omp_parallel_for( ... ) {
    ...
    auto matmul = [realWeight,complexWeight](auto&& sourceElem, auto&& targetElem) {
              const auto targetVector =
                  complexWeight * Complex(sourceElem(0), sourceElem(1));
              targetElem(0) += targetVector.real();
              targetElem(1) += targetVector.imag();
              targetElem(2) += realWeight * sourceElem(2);
    };

    if constexpr (Rank == 3) {
        for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
              auto sourceElem = sourceSlice.slice(jlev,array::Range::all());
              auto targetElem = targetSlice.slice(jlev,array::Range::all());
              matmul(sourceElem,targetElem);
        }
    }
    if constexpr (Rank == 2) {
        matmul(sourceSlice,targetSlice);
    }

If instead of "capturing" realWeight,complexWeight in the lambda, and passing it by argument, it WORKS again.

atlas_omp_parallel_for( ... ) {
    ...
    auto matmul = [](auto&& sourceElem, auto&& targetElem, auto realWeight, auto complexWeight) {
              const auto targetVector =
                  complexWeight * Complex(sourceElem(0), sourceElem(1));
              targetElem(0) += targetVector.real();
              targetElem(1) += targetVector.imag();
              targetElem(2) += realWeight * sourceElem(2);
    };

    if constexpr (Rank == 3) {
        for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
              auto sourceElem = sourceSlice.slice(jlev,array::Range::all());
              auto targetElem = targetSlice.slice(jlev,array::Range::all());
              matmul(sourceElem, targetElem, realWeight, complexWeight);
        }
    }
    if constexpr (Rank == 2) {
        matmul(sourceSlice, targetSlice, realWeight, complexWeight);
    }

So it appears the issue is with the lambda capture in the OpenMP region.
I verified that the problem is also still there with nvhpc/23.7

@odlomax
Copy link
Contributor Author

odlomax commented Feb 23, 2024

I need to give up on templates. They're nothing but trouble! 😅

@wdeconinck
Copy link
Member

I need to give up on templates. They're nothing but trouble! 😅

Sometimes it's really not worth the effort (probably many many many hours) to cater for added generality or save some lines of code. With if constexpr a lot of templates can be made a lot cleaner (and SFINAE can disappear).

@odlomax
Copy link
Contributor Author

odlomax commented Feb 24, 2024

I need to give up on templates. They're nothing but trouble! 😅

Sometimes it's really not worth the effort (probably many many many hours) to cater for added generality or save some lines of code. With if constexpr a lot of templates can be made a lot cleaner (and SFINAE can disappear).

I'm glad concepts in C++20 has killed SFINAE for good. I think my problem is that lambda expressions have become my golden hammer, and now every problem looks like a nail!

@wdeconinck
Copy link
Member

Thanks @odlomax really it's not your fault but nvidia compiler's fault. Thanks for working around! ❤️

@odlomax
Copy link
Contributor Author

odlomax commented Feb 24, 2024

Thanks @odlomax really it's not your fault but nvidia compiler's fault. Thanks for working around! ❤️

Honestly, it's good life experience! We're going to have nvhpc on our new system, so it's useful to know its peculiarities!

@wdeconinck
Copy link
Member

Looks good to me now. Do you mind if I "Squash-and-Merge" ?

@odlomax
Copy link
Contributor Author

odlomax commented Feb 27, 2024

Looks good to me now. Do you mind if I "Squash-and-Merge" ?

Wonderful! Squash away!

@wdeconinck wdeconinck merged commit 61b2933 into ecmwf:develop Feb 27, 2024
96 checks passed
@odlomax odlomax deleted the feature/spherical_vector_adjoint branch February 27, 2024 12:51
wdeconinck added a commit that referenced this pull request Apr 9, 2024
* release/0.37.0: (23 commits)
  Update Changelog
  Version 0.37.0
  Projection base implementation derivatives performance/encapsulation … (#185)
  atlas_io is an adaptor library when eckit_codec is available (#181)
  Fix build for configuration setting ATLAS_BITS_LOCAL=64 (#184)
  Revert "Avoid linker warnings on macOS about 'ld: warning: could not create compact unwind for ...'"
  Cosmetic: readability braces
  Initialize std::array values to zero because valgrind complains, even though c++ standard mandates it should be default-initialized to zero
  Fix bug in TraceT caused by typo where the title was wrong
  Avoid linker warnings on macOS about 'ld: warning: could not create compact unwind for ...'
  Use new LocalConfiguration baseclass functions in util::Config and util::Metadata instead of eckit::Value backdoor
  Removed leftover code missed in PR #175
  Update `SphericalVector` to work with StructuredColumns as source functionspace. (#175)
  Bugfix for regional grids with ny > nx
  Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. (#168)
  Added test with empty integer sequence.
  Added arrayForEachDim method.
  Add docs build workflow
  Github Actions: Fix macOS MPI slots
  Fix for elements that might have unassigned partition via parallel Delaunay meshgenerator
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding robust vector-field interpolation functionality (current PR and further work).
3 participants