Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168

odlomax · 2024-01-08T10:22:09Z

Following on from PR #163, this PR primarily adds and tests the adjoint methods for the interpolation::method::SphericalVector class.

Several refactors were also applied to reduce complexity and code duplication:

The matrix multiplication code was removed from SphericalVector.cc and given it's own class ComplexMatrixMultiply
A SparseMatrix class was added that wraps the Eigen Sparse matrix into something that looks more like eckit::linalg::SparseMatrix with a Value template. This also confines all the nasty macros to one place.
Types.h was added, containing all the main type definitions required for the SphericalVector class.
ArrayForEachDim was added to array::helpers::ArrayForEach. This allows the iteration dimensions to be set using an std::integer_sequence.

Further testing was added

Test added for ArrayForEachDim in ArrayForEach tests.
Adjoint methods added to all interpolation tests.
StructuredColumns to CubedSphere interpolation test added. Figures showing performance are given below. Note: StructuredColumns FixupHaloForVectors methods interferes with SphericalVector. You must make sure a halo exchange isn't called on the source field for it to work.

Notes:
Apologies for the strange commit history, I was trying to debug the MacOS build without a Mac.
Closes #166

Figures

O48 Structured Columns to Cubed Sphere interpolation error when using SphericalVector to treat vector fields (same setup as PR #163).

O48 Structured Columns to Cubed Sphere interpolation error when using FixupHaloForVectors to treat vector fields.

src/atlas/array/helpers/ArrayForEach.h

odlomax · 2024-01-12T10:01:42Z

@wdeconinck @pmaciel

Just thought you might be interested that I tried the vector interpolation scheme with StructuredColumns (see above). Also did some tests with double and half the source functionspace resolution. Reassuringly, there's quadratic convergence on the analytic solution. 😀

wdeconinck · 2024-01-25T13:26:46Z

Nice Addition @odlomax !

Could we try to split the PR in two, one for the arrayForEach and one for the adjoint?

We can merge the arrayForEach one first, and rebase this one after.

odlomax · 2024-01-25T15:22:22Z

Nice Addition @odlomax !

Could we try to split the PR in two, one for the arrayForEach and one for the adjoint?

We can merge the arrayForEach one first, and rebase this one after.

Absolutely. Watch this space...

odlomax · 2024-01-25T16:47:22Z

Nice Addition @odlomax !
Could we try to split the PR in two, one for the arrayForEach and one for the adjoint?
We can merge the arrayForEach one first, and rebase this one after.

Absolutely. Watch this space...

Done. Here's the PR.

wdeconinck · 2024-01-30T08:31:53Z

Other PR #171 is merged. Please rebase :)

Removed unnecessary headers.

…it properly now.

wdeconinck · 2024-02-13T19:11:36Z

🥳 @odlomax

odlomax · 2024-02-14T01:03:18Z

🥳 @odlomax

Great success! 🤩

odlomax · 2024-02-15T10:02:56Z

No more pushes from me for now. I'll tidy up the macros when it's reviewed. 🙂

wdeconinck · 2024-02-23T16:51:19Z

src/atlas/interpolation/method/sphericalvector/SphericalVector.cc

-#warning Disabling OpenMP to prevent internal compiler error for intel-classic version < 2021.6 (intel-oneapi/2022.2)
-#undef atlas_omp_parallel_for
-#define atlas_omp_parallel_for for
-#endif


How come you could drop this?

I think the problem arose when supplying a template parameter functor to loop body. I've since separated out the use-cases into their own loops, as @pmaciel suggested. No longer triggers the error. 🙂

src/atlas/interpolation/method/sphericalvector/ComplexMatrixMultiply.h

…tor.

wdeconinck · 2024-02-23T20:49:24Z

I now played a bit with nvhpc/22.11 myself now in the routine ComplexMatrixMultiply::applyThreeVector
I tried simplifying until it worked, without using the arrayForeachDim construct. Following works:

atlas_omp_parallel_for( ... ) {
    ...
    if constexpr (Rank == 3) {
        for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
            const auto targetVector =
                complexWeight * Complex(sourceSlice(jlev,0), sourceSlice(jlev,1));

            targetSlice(jlev,0) += targetVector.real();
            targetSlice(jlev,1) += targetVector.imag();
            targetSlice(jlev,2) += realWeight * sourceSlice(jlev,2);
        }
    }
    if constexpr (Rank == 2) {
        ...
    }

What however doesn't work is creating a lambda that captures realWeight,complexWeight to do the inner part as:

atlas_omp_parallel_for( ... ) {
    ...
    auto matmul = [realWeight,complexWeight](auto&& sourceElem, auto&& targetElem) {
              const auto targetVector =
                  complexWeight * Complex(sourceElem(0), sourceElem(1));
              targetElem(0) += targetVector.real();
              targetElem(1) += targetVector.imag();
              targetElem(2) += realWeight * sourceElem(2);
    };

    if constexpr (Rank == 3) {
        for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
              auto sourceElem = sourceSlice.slice(jlev,array::Range::all());
              auto targetElem = targetSlice.slice(jlev,array::Range::all());
              matmul(sourceElem,targetElem);
        }
    }
    if constexpr (Rank == 2) {
        matmul(sourceSlice,targetSlice);
    }

If instead of "capturing" realWeight,complexWeight in the lambda, and passing it by argument, it WORKS again.

atlas_omp_parallel_for( ... ) {
    ...
    auto matmul = [](auto&& sourceElem, auto&& targetElem, auto realWeight, auto complexWeight) {
              const auto targetVector =
                  complexWeight * Complex(sourceElem(0), sourceElem(1));
              targetElem(0) += targetVector.real();
              targetElem(1) += targetVector.imag();
              targetElem(2) += realWeight * sourceElem(2);
    };

    if constexpr (Rank == 3) {
        for( idx_t jlev=0; jlev<targetSlice.shape(0); ++jlev) {
              auto sourceElem = sourceSlice.slice(jlev,array::Range::all());
              auto targetElem = targetSlice.slice(jlev,array::Range::all());
              matmul(sourceElem, targetElem, realWeight, complexWeight);
        }
    }
    if constexpr (Rank == 2) {
        matmul(sourceSlice, targetSlice, realWeight, complexWeight);
    }

So it appears the issue is with the lambda capture in the OpenMP region.
I verified that the problem is also still there with nvhpc/23.7

odlomax · 2024-02-23T23:11:25Z

I need to give up on templates. They're nothing but trouble! 😅

wdeconinck · 2024-02-24T02:24:49Z

I need to give up on templates. They're nothing but trouble! 😅

Sometimes it's really not worth the effort (probably many many many hours) to cater for added generality or save some lines of code. With if constexpr a lot of templates can be made a lot cleaner (and SFINAE can disappear).

odlomax · 2024-02-24T09:38:26Z

I need to give up on templates. They're nothing but trouble! 😅

Sometimes it's really not worth the effort (probably many many many hours) to cater for added generality or save some lines of code. With if constexpr a lot of templates can be made a lot cleaner (and SFINAE can disappear).

I'm glad concepts in C++20 has killed SFINAE for good. I think my problem is that lambda expressions have become my golden hammer, and now every problem looks like a nail!

wdeconinck · 2024-02-24T21:58:47Z

Thanks @odlomax really it's not your fault but nvidia compiler's fault. Thanks for working around! ❤️

odlomax · 2024-02-24T22:16:42Z

Thanks @odlomax really it's not your fault but nvidia compiler's fault. Thanks for working around! ❤️

Honestly, it's good life experience! We're going to have nvhpc on our new system, so it's useful to know its peculiarities!

wdeconinck · 2024-02-27T08:49:26Z

Looks good to me now. Do you mind if I "Squash-and-Merge" ?

odlomax · 2024-02-27T09:46:54Z

Looks good to me now. Do you mind if I "Squash-and-Merge" ?

Wonderful! Squash away!

* release/0.37.0: (23 commits) Update Changelog Version 0.37.0 Projection base implementation derivatives performance/encapsulation … (#185) atlas_io is an adaptor library when eckit_codec is available (#181) Fix build for configuration setting ATLAS_BITS_LOCAL=64 (#184) Revert "Avoid linker warnings on macOS about 'ld: warning: could not create compact unwind for ...'" Cosmetic: readability braces Initialize std::array values to zero because valgrind complains, even though c++ standard mandates it should be default-initialized to zero Fix bug in TraceT caused by typo where the title was wrong Avoid linker warnings on macOS about 'ld: warning: could not create compact unwind for ...' Use new LocalConfiguration baseclass functions in util::Config and util::Metadata instead of eckit::Value backdoor Removed leftover code missed in PR #175 Update `SphericalVector` to work with StructuredColumns as source functionspace. (#175) Bugfix for regional grids with ny > nx Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. (#168) Added test with empty integer sequence. Added arrayForEachDim method. Add docs build workflow Github Actions: Fix macOS MPI slots Fix for elements that might have unassigned partition via parallel Delaunay meshgenerator ...

github-actions bot added the contributor label Jan 8, 2024

odlomax commented Jan 8, 2024

View reviewed changes

src/atlas/array/helpers/ArrayForEach.h Outdated Show resolved Hide resolved

odlomax mentioned this pull request Jan 25, 2024

Added arrayForEachDim method to simplify template metaprogramming. #171

Merged

odlomax and others added 22 commits January 30, 2024 17:48

Fixed test output file name.

d04b109

Added correct constness to geometry::UnitSphere::greatCircleCourse.

e3f38ec

Removed unecessary RealMatrixMap definition.

273e636

Added arrayForEachDim wrapper function.

43ba1a0

Partially implemented Complex Matrix multiply class.

b796c1b

Implimented ComplexMatrixMultiply class.

c6221e5

Further refactoring.

275b8d5

Added adjoint tests.

7876f69

Adjusted test assert.

96a7cad

Attempting to solve MacOS nan issues.

a13b10b

Added norm checks on fields.

ca3295f

Assigning targetField = 0. in test.

82b3048

Static cast in dot product.

718f1ae

Added explicit NaN checking.

be26276

Explicit types in NaN checking lambda parameters.

c4e85b7

Checking for IEEE 754 arithmetic support.

db7df5b

Added NaN checking to weight generation.

3610a46

Trying ArrayView assign instead of Array copy.

2bc48d0

Disabled halo-exchange.

22b53e0

Addressed undefined behaviour with std::polar.

6838f5a

Fixed StructuredColumns inaccuracies.

207cc19

Update SphericalVector.h

7376c64

Removed unnecessary headers.

I forgot to actually disable OpenMP where it mattered. I've disabled …

c7c75bd

…it properly now.

github-actions bot removed the approved-for-ci label Feb 13, 2024

wdeconinck added the approved-for-ci label Feb 13, 2024

wdeconinck reviewed Feb 23, 2024

View reviewed changes

src/atlas/interpolation/method/sphericalvector/ComplexMatrixMultiply.h Outdated Show resolved Hide resolved

odlomax added 3 commits February 23, 2024 17:26

Removed unnecessary macros.

1c58cb6

Changed ATLAS_ASSERT_MSG to ATLAS_ASSERT.

a5464bd

Fixed incorrect check for successful cast in SphericalVector construc…

3656462

…tor.

github-actions bot removed the approved-for-ci label Feb 23, 2024

odlomax added 4 commits February 24, 2024 20:15

Removed SFINAE.

bd6f482

Replaced lambda expressions with private class methods.

c804d22

Removed NVCOMPILER macros.

29ae7fc

Removed some duplicated code in multiplyAdd method.

00c2d4b

wdeconinck added the approved-for-ci label Feb 24, 2024

wdeconinck approved these changes Feb 27, 2024

View reviewed changes

wdeconinck merged commit 61b2933 into ecmwf:develop Feb 27, 2024
96 checks passed

odlomax deleted the feature/spherical_vector_adjoint branch February 27, 2024 12:51

odlomax mentioned this pull request Mar 4, 2024

Update SphericalVector to work with StructuredColumns as source functionspace. #175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168

Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168

odlomax commented Jan 8, 2024 •

edited

Loading

odlomax commented Jan 12, 2024

wdeconinck commented Jan 25, 2024

odlomax commented Jan 25, 2024

odlomax commented Jan 25, 2024

wdeconinck commented Jan 30, 2024

wdeconinck commented Feb 13, 2024

odlomax commented Feb 14, 2024

odlomax commented Feb 15, 2024

wdeconinck Feb 23, 2024

odlomax Feb 23, 2024

wdeconinck commented Feb 23, 2024

odlomax commented Feb 23, 2024

wdeconinck commented Feb 24, 2024

odlomax commented Feb 24, 2024

wdeconinck commented Feb 24, 2024

odlomax commented Feb 24, 2024

wdeconinck commented Feb 27, 2024

odlomax commented Feb 27, 2024

Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168

Refactoring of interpolation::method::SphericalVector and implementation of adjoint methods. #168

Conversation

odlomax commented Jan 8, 2024 • edited Loading

Figures

odlomax commented Jan 12, 2024

wdeconinck commented Jan 25, 2024

odlomax commented Jan 25, 2024

odlomax commented Jan 25, 2024

wdeconinck commented Jan 30, 2024

wdeconinck commented Feb 13, 2024

odlomax commented Feb 14, 2024

odlomax commented Feb 15, 2024

wdeconinck Feb 23, 2024

Choose a reason for hiding this comment

odlomax Feb 23, 2024

Choose a reason for hiding this comment

wdeconinck commented Feb 23, 2024

odlomax commented Feb 23, 2024

wdeconinck commented Feb 24, 2024

odlomax commented Feb 24, 2024

wdeconinck commented Feb 24, 2024

odlomax commented Feb 24, 2024

wdeconinck commented Feb 27, 2024

odlomax commented Feb 27, 2024

odlomax commented Jan 8, 2024 •

edited

Loading