[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL #503

hjabird · 2024-05-31T08:47:30Z

Description

OOP COMPLEX and IP REAL_REAL overload resolution is problematic
Correct with SFINAE

Checklist

All Submissions

Do all unit tests pass locally? Attach a log.
dfttest.txt
Have you formatted the code using clang-format?

New interfaces

It is unclear to me exactly what the spec intended and consequently. The spec does not define types for the data passed into DFT functions, only that there is sufficient space. This creates problems with resolving overloads however.

Bug fixes

[N/A] Have you added relevant regression tests?
Have you included information on how to reproduce the issue (either in a
GitHub issue or in this PR)?

hjabird · 2024-05-31T08:50:14Z

From the issue thread @raphael-egan

Thanks, @hjabird! That SFINAE approach looks like a significant improvement in my opinion. I have one concern but I am a little rusty on the metaprogramming front so you may need to correct me below. Apologies for commenting here but I couldn't find a way to comment directly where relevant.
Consider a single-precision complex descriptor desc successfully committed, configured with COMPLEX_COMPLEX and NOT_INPLACE values set for configuration parameters COMPLEX_STORAGE and PLACEMENT, respectively. Let's assume that the user communicates I/O data as float *in, *out pointers to device-accessible USM allocations (note: not std::complex<float>*).

My understanding of the specs and of the implementation would be that, in such a case, a call like compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); can't be expected to result in an out-of-place, transform (likely intended by the user), and your branch complies with that, unless I'm mistaken. Do we agree on that?

Is the same to be expected for compute_{for,back}ward(desc, in, out); (no explicit specification of template parameters)? If not, why?

Finally, could you please confirm that compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, in, out); would be the out-of-place, transform likely intended by the user, with the changes from that branch despite the implementation of valid_oop_iotypes?

Rbiessy

That looks like a good solution to me if we want to allow the user to not explicitly set the template arguments. It may lead to cryptic error messages if the API is mis-used though. That's fine with me.

I think we should remove the explicit template arguments in the examples and tests if we go forward with this solution.

include/oneapi/mkl/dft/detail/types_impl.hpp

hjabird · 2024-05-31T10:48:39Z

Consider a single-precision complex descriptor desc successfully committed, configured with COMPLEX_COMPLEX and NOT_INPLACE values set for configuration parameters COMPLEX_STORAGE and PLACEMENT, respectively. Let's assume that the user communicates I/O data as float *in, out pointers to device-accessible USM allocations (note: not std::complex).

Points:

My understanding of the specs and of the implementation would be that, in such a case, a call like compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); can't be expected to result in an out-of-place, transform (likely intended by the user), and your branch complies with that, unless I'm mistaken. Do we agree on that?

compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); where in and out are float* would result in an inplace dft where in would be taken as the argument inout_re and out as inout_im. So we agree that it wouldn't result in an output of place transform, and the branch complies with that.

Is the same to be expected for compute_{for,back}ward(desc, in, out); (no explicit specification of template parameters)? If not, why?

Yes. The logic I've applied here is that their are two possible cases:

You have a REAL DFT, in which cases you can't have REAL-REAL data. Consequently, only the out-of-place signature is enabled and it accepts float as both input and output.
You have a complex DFT. Both DFTs are enabled, but COMPLEX_COMPLEX data must be passed as a std::complex. Consequently, the two overloads can be differentiated by the types passed into them.

Finally, could you please confirm that compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, in, out); would be the out-of-place, transform likely intended by the user, with the changes from that branch despite the implementation of valid_oop_iotypes?

No, you wouldn't get the out-of-place transform. You'd get an error instead at the moment. Thinking about it, we could enable this, since the REAL-REAL version that the types suggest would be selected first, as is the case at the moment. However, I don't think its unreasonable that std::complex data should be passed as a pointer to that type - the user could also reinterpret the pointer.

include/oneapi/mkl/dft/detail/types_impl.hpp

raphael-egan · 2024-05-31T16:15:53Z

Consider a single-precision complex descriptor desc successfully committed, configured with COMPLEX_COMPLEX and NOT_INPLACE values set for configuration parameters COMPLEX_STORAGE and PLACEMENT, respectively. Let's assume that the user communicates I/O data as float *in, out pointers to device-accessible USM allocations (note: not std::complex).

Points:

My understanding of the specs and of the implementation would be that, in such a case, a call like compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); can't be expected to result in an out-of-place, transform (likely intended by the user), and your branch complies with that, unless I'm mistaken. Do we agree on that?

compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); where in and out are float* would result in an inplace dft where in would be taken as the argument inout_re and out as inout_im. So we agree that it wouldn't result in an output of place transform, and the branch complies with that.

OK. Although rather unfortunate, the specs are clearly making that specialization very specific, so it is consistent. Thanks for confirming.

Is the same to be expected for compute_{for,back}ward(desc, in, out); (no explicit specification of template parameters)? If not, why?

Yes. The logic I've applied here is that their are two possible cases:

You have a REAL DFT, in which cases you can't have REAL-REAL data. Consequently, only the out-of-place signature is enabled and it accepts float as both input and output.

You have a complex DFT. Both DFTs are enabled, but COMPLEX_COMPLEX data must be passed as a std::complex. Consequently, the two overloads can be differentiated by the types passed into them.

I'm not sure I follow this very last point: why must the data be passed as an std::complex-typed container? As you point out in this PR's description: "The spec does not define types for the data passed into DFT functions, only that there is sufficient space". Unless I'm mistaken/missing something in the specs, I do not see what therein prevents users from passing say float-typed containers for an out-of-place c2c SP DFT with COMPLEX_COMPLEX storage, conceptually. Please correct me and/or point me to the part of the specs that enforces/requires std::complex-typed containers in such cases, if I'm wrong.

Finally, could you please confirm that compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, in, out); would be the out-of-place, transform likely intended by the user, with the changes from that branch despite the implementation of valid_oop_iotypes?

No, you wouldn't get the out-of-place transform. You'd get an error instead at the moment. Thinking about it, we could enable this, since the REAL-REAL version that the types suggest would be selected first, as is the case at the moment. However, I don't think its unreasonable that std::complex data should be passed as a pointer to that type - the user could also reinterpret the pointer.

Unless I'm missing something (see above), I think that is/could be a problem. With explicit value-specialization of all 3 parameters and the number of arguments at play, there is no ambiguity regarding the user's intention in this case and the "standard" out-of-place c2c should be used. Am I wrong?

hjabird · 2024-05-31T16:50:20Z

I'm not sure I follow this very last point: why must the data be passed as an std::complex-typed container? As you point out in this PR's description: "The spec does not define types for the data passed into DFT functions, only that there is sufficient space". Unless I'm mistaken/missing something in the specs, I do not see what therein prevents users from passing say float-typed containers for an out-of-place c2c SP DFT with COMPLEX_COMPLEX storage, conceptually. Please correct me and/or point me to the part of the specs that enforces/requires std::complex-typed containers in such cases, if I'm wrong.

There is no part of the spec that enforces / requires this as far as I can see. Data is implicitly assumed to be of a certain type (spec page, Implicitly-assumed elementary data type).

The problem is that this also follows for the inplace real-real signature: according to the spec, inout_re and inout_im can be of any pointer type and implicitly cast to scalar pointers. This brings us back to the original problem of the incorrect overload being selected.

Following the letter of the spec gives undersirable results (leading to the #499). Instead, I think we ought to follow I would expect as a user (real data is pointers to reals, complex data is pointers to complex), which was also hopefully what was intended by the specification.

Unless I'm missing something (see above), I think that is/could be a problem. With explicit value-specialization of all 3 parameters and the number of arguments at play, there is no ambiguity regarding the user's intention in this case and the "standard" out-of-place c2c should be used. Am I wrong?

You're right, if we want to follow the spec exactly. I think we can reenable being able to do this and still have overload resolution fixed (by default, float parameters will result in the inplace real overload being used). However, having restrictions on the pointer types given as arguments to some overloads but not others could lead to confusion. My preferred approach would be that the user can explicity select the overload by casting their inputs/output pointers to the type that the represent for the purposes of the DFT.

raphael-egan · 2024-06-05T16:50:00Z

I'm not sure I follow this very last point: why must the data be passed as an std::complex-typed container? As you point out in this PR's description: "The spec does not define types for the data passed into DFT functions, only that there is sufficient space". Unless I'm mistaken/missing something in the specs, I do not see what therein prevents users from passing say float-typed containers for an out-of-place c2c SP DFT with COMPLEX_COMPLEX storage, conceptually. Please correct me and/or point me to the part of the specs that enforces/requires std::complex-typed containers in such cases, if I'm wrong.

There is no part of the spec that enforces / requires this as far as I can see. Data is implicitly assumed to be of a certain type (spec page, Implicitly-assumed elementary data type).

Personally, I understand this part of the spec (and references thereto) as "that is how descriptors are to read and write their I/O data" not as "that is(are) the only data type(s) that descriptors may possibly accept", basically making a distinction between "implicitly-assumed" vs "expected" data types. For instance, in the first sentence of the dedicated section: "a descriptor object may re-interpret the base data type of that data container into an implicitly-assumed elementary data type [...]"

The problem is that this also follows for the inplace real-real signature: according to the spec, inout_re and inout_im can be of any pointer type and implicitly cast to scalar pointers. This brings us back to the original problem of the incorrect overload being selected.
Following the letter of the spec gives undersirable results (leading to the #499). Instead, I think we ought to follow I would expect as a user (real data is pointers to reals, complex data is pointers to complex), which was also hopefully what was intended by the specification.

I agree and let me clarify what I meant to point out. Considering a single-precision complex descriptor desc, and float* types for a, b, in my opinion and in my understanding of the spec, a usage like
compute_forward(desc, a, b);
or like
compute_forward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, a, b);
may legitimately (and rather unambiguously in the latter case) be interpreted as a call to the in-place operation with split-complex (aka "real-real") data storage. However, specifying all possible template arguments as in
compute_forward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, a, b);
unambiguously calls for an out-of-place operation with standard interleaved complex (aka "complex-complex") data storage in my opinion and it would seem strange to forbid it (it's not unheard of in my experience).

It does make sense to me to rule out support for some data types in case of possibly unspecified template arguments (i.e. ruling out the in-place "real-real" operation if std::complex<float>* arguments are used), but I think it's an overkill to rule out some support in case all possible template arguments are fully-specified.

Unless I'm missing something (see above), I think that is/could be a problem. With explicit value-specialization of all 3 parameters and the number of arguments at play, there is no ambiguity regarding the user's intention in this case and the "standard" out-of-place c2c should be used. Am I wrong?

You're right, if we want to follow the spec exactly. I think we can reenable being able to do this and still have overload resolution fixed (by default, float parameters will result in the inplace real overload being used). However, having restrictions on the pointer types given as arguments to some overloads but not others could lead to confusion. My preferred approach would be that the user can explicity select the overload by casting their inputs/output pointers to the type that the represent for the purposes of the DFT.

That's another option which I am not personally opposed to, although that might be fairly inconvenient/ugly for users of SYCL buffers in my opinion.
Either way, I think there is some ambiguity in the current form of the spec when it comes to possible implementations and I think we ought to clarify that by unifying and agreeing our understanding of it. For instance, we could clarify that no other type than float* (resp. double*) is to be supported the single-precision (resp. double-precision) in-place real-real c2c compute functions, with similar restrictions for versions with buffers I/O. Depending on the conclusion from this thread, we could add either a clarification that "all 3 template arguments must be specified by users for out-of-place complex-complex c2c compute functions if float*/double* are used" (my suggestion) or "no other type than std::complex<float>* (resp. std::complex<double>*) is to be supported the single-precision (resp. double-precision) out-of-place complex-complex c2c compute functions, with similar restrictions for versions with buffers I/O" (your suggestion). Do we agree on such a need to clarify the spec based on the final decision made regarding this?

raphael-egan · 2024-06-05T18:47:31Z

For instance, we could clarify that no other type than float* (resp. double*) is to be supported the single-precision (resp. double-precision) in-place real-real c2c compute functions, with similar restrictions for versions with buffers I/O. Depending on the conclusion from this thread, we could add either a clarification that "all 3 template arguments must be specified by users for out-of-place complex-complex c2c compute functions if float*/double* are used" (my suggestion) or "no other type than std::complex<float>* (resp. std::complex<double>*) is to be supported the single-precision (resp. double-precision) out-of-place complex-complex c2c compute functions, with similar restrictions for versions with buffers I/O" (your suggestion). Do we agree on such a need to clarify the spec based on the final decision made regarding this?

Actually, thinking more about it, either spec update suggestion would probably be problematic for closed-source oneMKL because it must be(come) spec-compliant and it does support float*/double* for out-of-place c2c operations with "complex-complex" storage (unambiguously right now, due to unimplemented support for "real-real" storage). If we think that clarifying the spec is required regarding this possible ambiguity, we must think more carefully about how to phrase it: explicitly ruling such possible usage out or requesting full specification of all template parameters in the spec would likely result in frustrating some closed-source oneMKL users, in the end. If this is considered to be an implementation detail that does not motivate a specification update by itself though, I'll let you decide what's best to do regarding the possible out-of-place "complex-complex" c2c cases.

I'm satisfied with the changes made to restrict usage of in-place "real-real" c2c operations, which were the core motivation for this PR.

hjabird · 2024-06-06T11:14:42Z

I've changed things such that:

dft::descriptor<SINGLE, COMPLEX> desc(N);
float* a, b;
...
dft::compute_forward(desc, a, b); // Calls inplace REAL_REAL version.
dft::compute_forward<decltype(desc), float, float>(desc, a, b); // Calls OOP COMPLEX COMPLEX version.

Your feedback is very much appreciated Raphael! I'll be contributing an update to the spec at some point, but not immediately.

raphael-egan

Thank you, the changes look appropriate to me. Apologies for the quibbling.

hjabird · 2024-06-10T08:51:22Z

Thank you, the changes look appropriate to me. Apologies for the quibbling.

I very much appreciate the feedback when making possibly dangerous changes like this!

* OOP COMPLEX and IP REAL_REAL overload resolution is problematic * Correct with SFINAE wq

hjabird · 2024-07-02T14:01:53Z

Tests pass after rebasing:
rebased_ctest.txt

Spec PR is merged uxlfoundation/oneAPI-spec#544

…lfoundation#503) * OOP COMPLEX and IP REAL_REAL overload resolution is problematic * Inplace real-real overload would be selected when out-of-place complex-complex DFT was intended. * With spec update, this PR uses SFINAE to give the expected behaviour for the user.

Rbiessy reviewed May 31, 2024

View reviewed changes

include/oneapi/mkl/dft/detail/types_impl.hpp Show resolved Hide resolved

hjabird commented May 31, 2024

View reviewed changes

include/oneapi/mkl/dft/detail/types_impl.hpp Show resolved Hide resolved

hjabird mentioned this pull request May 31, 2024

[DFT] Add static_assert to check types passed into compute_foward and compute_backward #502

Closed

2 tasks

Rbiessy reviewed May 31, 2024

View reviewed changes

include/oneapi/mkl/dft/detail/types_impl.hpp Outdated Show resolved Hide resolved

hjabird marked this pull request as ready for review May 31, 2024 13:58

raphael-egan approved these changes Jun 6, 2024

View reviewed changes

Rbiessy approved these changes Jun 10, 2024

View reviewed changes

hjabird mentioned this pull request Jun 10, 2024

[DFT] Constrain data_type in compute_*ward real-real inplace uxlfoundation/oneAPI-spec#544

Merged

hjabird added 3 commits July 2, 2024 14:31

[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL

8184818

* OOP COMPLEX and IP REAL_REAL overload resolution is problematic * Correct with SFINAE wq

Use descriptor_scalar_t elsewhere

5ad57d3

Reenable OOP complex-complex version for float input

516c7fa

hjabird force-pushed the dft_overload_resolution_with_enableif branch from 914d742 to 516c7fa Compare July 2, 2024 14:01

hjabird merged commit a2490e7 into uxlfoundation:develop Jul 2, 2024
6 checks passed

hjabird mentioned this pull request Aug 7, 2024

v0.5 release notes #547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL #503

[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL #503

hjabird commented May 31, 2024 •

edited

Loading

hjabird commented May 31, 2024

Rbiessy left a comment

hjabird commented May 31, 2024 •

edited

Loading

raphael-egan commented May 31, 2024

hjabird commented May 31, 2024

raphael-egan commented Jun 5, 2024

raphael-egan commented Jun 5, 2024

hjabird commented Jun 6, 2024

raphael-egan left a comment

hjabird commented Jun 10, 2024

hjabird commented Jul 2, 2024 •

edited

Loading

[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL #503

[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL #503

Conversation

hjabird commented May 31, 2024 • edited Loading

Description

Checklist

All Submissions

New interfaces

Bug fixes

hjabird commented May 31, 2024

Rbiessy left a comment

Choose a reason for hiding this comment

hjabird commented May 31, 2024 • edited Loading

raphael-egan commented May 31, 2024

hjabird commented May 31, 2024

raphael-egan commented Jun 5, 2024

raphael-egan commented Jun 5, 2024

hjabird commented Jun 6, 2024

raphael-egan left a comment

Choose a reason for hiding this comment

hjabird commented Jun 10, 2024

hjabird commented Jul 2, 2024 • edited Loading

hjabird commented May 31, 2024 •

edited

Loading

hjabird commented May 31, 2024 •

edited

Loading

hjabird commented Jul 2, 2024 •

edited

Loading