Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DFT] Correct overload resolution for OOP COMPLEX vs IP REAL_REAL #503

Merged

Conversation

hjabird
Copy link
Contributor

@hjabird hjabird commented May 31, 2024

Description

  • OOP COMPLEX and IP REAL_REAL overload resolution is problematic
  • Correct with SFINAE

Fixes # (GitHub issue)

Checklist

All Submissions

  • Do all unit tests pass locally? Attach a log.
  • dfttest.txt
  • Have you formatted the code using clang-format?

New interfaces

It is unclear to me exactly what the spec intended and consequently. The spec does not define types for the data passed into DFT functions, only that there is sufficient space. This creates problems with resolving overloads however.

Bug fixes

  • [N/A] Have you added relevant regression tests?
  • Have you included information on how to reproduce the issue (either in a
    GitHub issue or in this PR)?

@hjabird
Copy link
Contributor Author

hjabird commented May 31, 2024

From the issue thread @raphael-egan

Thanks, @hjabird! That SFINAE approach looks like a significant improvement in my opinion. I have one concern but I am a little rusty on the metaprogramming front so you may need to correct me below. Apologies for commenting here but I couldn't find a way to comment directly where relevant.
Consider a single-precision complex descriptor desc successfully committed, configured with COMPLEX_COMPLEX and NOT_INPLACE values set for configuration parameters COMPLEX_STORAGE and PLACEMENT, respectively. Let's assume that the user communicates I/O data as float *in, *out pointers to device-accessible USM allocations (note: not std::complex<float>*).

  1. My understanding of the specs and of the implementation would be that, in such a case, a call like compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); can't be expected to result in an out-of-place, transform (likely intended by the user), and your branch complies with that, unless I'm mistaken. Do we agree on that?
  2. Is the same to be expected for compute_{for,back}ward(desc, in, out); (no explicit specification of template parameters)? If not, why?
  3. Finally, could you please confirm that compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, in, out); would be the out-of-place, transform likely intended by the user, with the changes from that branch despite the implementation of valid_oop_iotypes?

Copy link
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a good solution to me if we want to allow the user to not explicitly set the template arguments. It may lead to cryptic error messages if the API is mis-used though. That's fine with me.

I think we should remove the explicit template arguments in the examples and tests if we go forward with this solution.

include/oneapi/mkl/dft/detail/types_impl.hpp Show resolved Hide resolved
@hjabird
Copy link
Contributor Author

hjabird commented May 31, 2024

Consider a single-precision complex descriptor desc successfully committed, configured with COMPLEX_COMPLEX and NOT_INPLACE values set for configuration parameters COMPLEX_STORAGE and PLACEMENT, respectively. Let's assume that the user communicates I/O data as float *in, out pointers to device-accessible USM allocations (note: not std::complex).

Points:

My understanding of the specs and of the implementation would be that, in such a case, a call like compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); can't be expected to result in an out-of-place, transform (likely intended by the user), and your branch complies with that, unless I'm mistaken. Do we agree on that?

compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); where in and out are float* would result in an inplace dft where in would be taken as the argument inout_re and out as inout_im. So we agree that it wouldn't result in an output of place transform, and the branch complies with that.

Is the same to be expected for compute_{for,back}ward(desc, in, out); (no explicit specification of template parameters)? If not, why?

Yes. The logic I've applied here is that their are two possible cases:

  • You have a REAL DFT, in which cases you can't have REAL-REAL data. Consequently, only the out-of-place signature is enabled and it accepts float as both input and output.
  • You have a complex DFT. Both DFTs are enabled, but COMPLEX_COMPLEX data must be passed as a std::complex. Consequently, the two overloads can be differentiated by the types passed into them.

Finally, could you please confirm that compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, in, out); would be the out-of-place, transform likely intended by the user, with the changes from that branch despite the implementation of valid_oop_iotypes?

No, you wouldn't get the out-of-place transform. You'd get an error instead at the moment. Thinking about it, we could enable this, since the REAL-REAL version that the types suggest would be selected first, as is the case at the moment. However, I don't think its unreasonable that std::complex data should be passed as a pointer to that type - the user could also reinterpret the pointer.

@hjabird hjabird marked this pull request as ready for review May 31, 2024 13:58
@raphael-egan
Copy link

Consider a single-precision complex descriptor desc successfully committed, configured with COMPLEX_COMPLEX and NOT_INPLACE values set for configuration parameters COMPLEX_STORAGE and PLACEMENT, respectively. Let's assume that the user communicates I/O data as float *in, out pointers to device-accessible USM allocations (note: not std::complex).

Points:

My understanding of the specs and of the implementation would be that, in such a case, a call like compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); can't be expected to result in an out-of-place, transform (likely intended by the user), and your branch complies with that, unless I'm mistaken. Do we agree on that?

compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, in, out); where in and out are float* would result in an inplace dft where in would be taken as the argument inout_re and out as inout_im. So we agree that it wouldn't result in an output of place transform, and the branch complies with that.

OK. Although rather unfortunate, the specs are clearly making that specialization very specific, so it is consistent. Thanks for confirming.

Is the same to be expected for compute_{for,back}ward(desc, in, out); (no explicit specification of template parameters)? If not, why?

Yes. The logic I've applied here is that their are two possible cases:

  • You have a REAL DFT, in which cases you can't have REAL-REAL data. Consequently, only the out-of-place signature is enabled and it accepts float as both input and output.
  • You have a complex DFT. Both DFTs are enabled, but COMPLEX_COMPLEX data must be passed as a std::complex. Consequently, the two overloads can be differentiated by the types passed into them.

I'm not sure I follow this very last point: why must the data be passed as an std::complex-typed container? As you point out in this PR's description: "The spec does not define types for the data passed into DFT functions, only that there is sufficient space". Unless I'm mistaken/missing something in the specs, I do not see what therein prevents users from passing say float-typed containers for an out-of-place c2c SP DFT with COMPLEX_COMPLEX storage, conceptually. Please correct me and/or point me to the part of the specs that enforces/requires std::complex-typed containers in such cases, if I'm wrong.

Finally, could you please confirm that compute_{for,back}ward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, in, out); would be the out-of-place, transform likely intended by the user, with the changes from that branch despite the implementation of valid_oop_iotypes?

No, you wouldn't get the out-of-place transform. You'd get an error instead at the moment. Thinking about it, we could enable this, since the REAL-REAL version that the types suggest would be selected first, as is the case at the moment. However, I don't think its unreasonable that std::complex data should be passed as a pointer to that type - the user could also reinterpret the pointer.

Unless I'm missing something (see above), I think that is/could be a problem. With explicit value-specialization of all 3 parameters and the number of arguments at play, there is no ambiguity regarding the user's intention in this case and the "standard" out-of-place c2c should be used. Am I wrong?

@hjabird
Copy link
Contributor Author

hjabird commented May 31, 2024

I'm not sure I follow this very last point: why must the data be passed as an std::complex-typed container? As you point out in this PR's description: "The spec does not define types for the data passed into DFT functions, only that there is sufficient space". Unless I'm mistaken/missing something in the specs, I do not see what therein prevents users from passing say float-typed containers for an out-of-place c2c SP DFT with COMPLEX_COMPLEX storage, conceptually. Please correct me and/or point me to the part of the specs that enforces/requires std::complex-typed containers in such cases, if I'm wrong.

There is no part of the spec that enforces / requires this as far as I can see. Data is implicitly assumed to be of a certain type (spec page, Implicitly-assumed elementary data type).

The problem is that this also follows for the inplace real-real signature: according to the spec, inout_re and inout_im can be of any pointer type and implicitly cast to scalar pointers. This brings us back to the original problem of the incorrect overload being selected.

Following the letter of the spec gives undersirable results (leading to the #499). Instead, I think we ought to follow I would expect as a user (real data is pointers to reals, complex data is pointers to complex), which was also hopefully what was intended by the specification.

Unless I'm missing something (see above), I think that is/could be a problem. With explicit value-specialization of all 3 parameters and the number of arguments at play, there is no ambiguity regarding the user's intention in this case and the "standard" out-of-place c2c should be used. Am I wrong?

You're right, if we want to follow the spec exactly. I think we can reenable being able to do this and still have overload resolution fixed (by default, float parameters will result in the inplace real overload being used). However, having restrictions on the pointer types given as arguments to some overloads but not others could lead to confusion. My preferred approach would be that the user can explicity select the overload by casting their inputs/output pointers to the type that the represent for the purposes of the DFT.

@raphael-egan
Copy link

I'm not sure I follow this very last point: why must the data be passed as an std::complex-typed container? As you point out in this PR's description: "The spec does not define types for the data passed into DFT functions, only that there is sufficient space". Unless I'm mistaken/missing something in the specs, I do not see what therein prevents users from passing say float-typed containers for an out-of-place c2c SP DFT with COMPLEX_COMPLEX storage, conceptually. Please correct me and/or point me to the part of the specs that enforces/requires std::complex-typed containers in such cases, if I'm wrong.

There is no part of the spec that enforces / requires this as far as I can see. Data is implicitly assumed to be of a certain type (spec page, Implicitly-assumed elementary data type).

Personally, I understand this part of the spec (and references thereto) as "that is how descriptors are to read and write their I/O data" not as "that is(are) the only data type(s) that descriptors may possibly accept", basically making a distinction between "implicitly-assumed" vs "expected" data types. For instance, in the first sentence of the dedicated section: "a descriptor object may re-interpret the base data type of that data container into an implicitly-assumed elementary data type [...]"

The problem is that this also follows for the inplace real-real signature: according to the spec, inout_re and inout_im can be of any pointer type and implicitly cast to scalar pointers. This brings us back to the original problem of the incorrect overload being selected.
Following the letter of the spec gives undersirable results (leading to the #499). Instead, I think we ought to follow I would expect as a user (real data is pointers to reals, complex data is pointers to complex), which was also hopefully what was intended by the specification.

I agree and let me clarify what I meant to point out. Considering a single-precision complex descriptor desc, and float* types for a, b, in my opinion and in my understanding of the spec, a usage like
compute_forward(desc, a, b);
or like
compute_forward<descriptor<precision::SINGLE, domain::COMPLEX>, float>(desc, a, b);
may legitimately (and rather unambiguously in the latter case) be interpreted as a call to the in-place operation with split-complex (aka "real-real") data storage. However, specifying all possible template arguments as in
compute_forward<descriptor<precision::SINGLE, domain::COMPLEX>, float, float>(desc, a, b);
unambiguously calls for an out-of-place operation with standard interleaved complex (aka "complex-complex") data storage in my opinion and it would seem strange to forbid it (it's not unheard of in my experience).

It does make sense to me to rule out support for some data types in case of possibly unspecified template arguments (i.e. ruling out the in-place "real-real" operation if std::complex<float>* arguments are used), but I think it's an overkill to rule out some support in case all possible template arguments are fully-specified.

Unless I'm missing something (see above), I think that is/could be a problem. With explicit value-specialization of all 3 parameters and the number of arguments at play, there is no ambiguity regarding the user's intention in this case and the "standard" out-of-place c2c should be used. Am I wrong?

You're right, if we want to follow the spec exactly. I think we can reenable being able to do this and still have overload resolution fixed (by default, float parameters will result in the inplace real overload being used). However, having restrictions on the pointer types given as arguments to some overloads but not others could lead to confusion. My preferred approach would be that the user can explicity select the overload by casting their inputs/output pointers to the type that the represent for the purposes of the DFT.

That's another option which I am not personally opposed to, although that might be fairly inconvenient/ugly for users of SYCL buffers in my opinion.
Either way, I think there is some ambiguity in the current form of the spec when it comes to possible implementations and I think we ought to clarify that by unifying and agreeing our understanding of it. For instance, we could clarify that no other type than float* (resp. double*) is to be supported the single-precision (resp. double-precision) in-place real-real c2c compute functions, with similar restrictions for versions with buffers I/O. Depending on the conclusion from this thread, we could add either a clarification that "all 3 template arguments must be specified by users for out-of-place complex-complex c2c compute functions if float*/double* are used" (my suggestion) or "no other type than std::complex<float>* (resp. std::complex<double>*) is to be supported the single-precision (resp. double-precision) out-of-place complex-complex c2c compute functions, with similar restrictions for versions with buffers I/O" (your suggestion). Do we agree on such a need to clarify the spec based on the final decision made regarding this?

@raphael-egan
Copy link

For instance, we could clarify that no other type than float* (resp. double*) is to be supported the single-precision (resp. double-precision) in-place real-real c2c compute functions, with similar restrictions for versions with buffers I/O. Depending on the conclusion from this thread, we could add either a clarification that "all 3 template arguments must be specified by users for out-of-place complex-complex c2c compute functions if float*/double* are used" (my suggestion) or "no other type than std::complex<float>* (resp. std::complex<double>*) is to be supported the single-precision (resp. double-precision) out-of-place complex-complex c2c compute functions, with similar restrictions for versions with buffers I/O" (your suggestion). Do we agree on such a need to clarify the spec based on the final decision made regarding this?

Actually, thinking more about it, either spec update suggestion would probably be problematic for closed-source oneMKL because it must be(come) spec-compliant and it does support float*/double* for out-of-place c2c operations with "complex-complex" storage (unambiguously right now, due to unimplemented support for "real-real" storage). If we think that clarifying the spec is required regarding this possible ambiguity, we must think more carefully about how to phrase it: explicitly ruling such possible usage out or requesting full specification of all template parameters in the spec would likely result in frustrating some closed-source oneMKL users, in the end. If this is considered to be an implementation detail that does not motivate a specification update by itself though, I'll let you decide what's best to do regarding the possible out-of-place "complex-complex" c2c cases.

I'm satisfied with the changes made to restrict usage of in-place "real-real" c2c operations, which were the core motivation for this PR.

@hjabird
Copy link
Contributor Author

hjabird commented Jun 6, 2024

I've changed things such that:

dft::descriptor<SINGLE, COMPLEX> desc(N);
float* a, b;
...
dft::compute_forward(desc, a, b); // Calls inplace REAL_REAL version.
dft::compute_forward<decltype(desc), float, float>(desc, a, b); // Calls OOP COMPLEX COMPLEX version.

Your feedback is very much appreciated Raphael! I'll be contributing an update to the spec at some point, but not immediately.

Copy link

@raphael-egan raphael-egan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the changes look appropriate to me. Apologies for the quibbling.

@hjabird
Copy link
Contributor Author

hjabird commented Jun 10, 2024

Thank you, the changes look appropriate to me. Apologies for the quibbling.

I very much appreciate the feedback when making possibly dangerous changes like this!

@hjabird hjabird force-pushed the dft_overload_resolution_with_enableif branch from 914d742 to 516c7fa Compare July 2, 2024 14:01
@hjabird
Copy link
Contributor Author

hjabird commented Jul 2, 2024

Tests pass after rebasing:
rebased_ctest.txt

Spec PR is merged uxlfoundation/oneAPI-spec#544

@hjabird hjabird merged commit a2490e7 into uxlfoundation:develop Jul 2, 2024
6 checks passed
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
…lfoundation#503)

* OOP COMPLEX and IP REAL_REAL overload resolution is problematic
  * Inplace real-real overload would be selected when out-of-place complex-complex DFT was intended.
  * With spec update, this PR uses SFINAE to give the expected behaviour for the user.
@hjabird hjabird mentioned this pull request Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants