Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Fix for detection of free function calls. #3003

Merged
merged 5 commits into from
Jan 13, 2021

Conversation

rdeodhar
Copy link
Contributor

@rdeodhar rdeodhar commented Jan 8, 2021

This change simplifies the code to detect calls to ""free functions", i.e., functions such as this_item().
The existing code does not work in all cases to detect the Kernel function, while the corrected code uses an already known Kernel function.

Signed-off-by: rdeodhar [email protected]

Copy link
Contributor

@premanandrao premanandrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test that shows the cases that didn't work before?

@rdeodhar
Copy link
Contributor Author

rdeodhar commented Jan 8, 2021

There is an existing test llvm_test_suite_sycl/basic_free_function_queries_free_function_queries.

@elizabethandrews
Copy link
Contributor

elizabethandrews commented Jan 11, 2021

The existing code does not work in all cases to detect the Kernel function,

Can you please add a test to illustrate this. Every change is supposed to be accompanied by a test unless it is NFC

cperkinsintel
cperkinsintel previously approved these changes Jan 11, 2021
Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@rdeodhar
Copy link
Contributor Author

A test has been added for the case:
parallel_for<class X>(range<1>{n}, [=](auto I) {}

@rdeodhar rdeodhar requested a review from premanandrao January 12, 2021 16:12
cgh.parallel_for<class FOX>(range<1>(1), [=](id<1> I) { this_id<1>(); });

// This kernel calls sycl::this_item
cgh.parallel_for<class BEE>(range<1>(1), [=](auto I) { this_item<1>(); });
Copy link
Contributor

@elizabethandrews elizabethandrews Jan 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. So getOperatorParens(KernelObj) doesn't return operator method when auto is used? Do you understand why? It is not obvious to me why. @premanandrao could you also please take a look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdeodhar, is this the case that was failing before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this particular construct in an existing runtime test was failing. I have replicated the situation in the existing clang lit test. Owing to some templating of members, the operator() of the KernelObj was not being found. However, the code looking for the operator() is unnecessary because the kernel function is already available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... That means KernelObj is defined differently when parameter type is auto? Do you know why this happens? I'm approving the patch since the answer to this question is probably outside the scope of this patch. However, if you know the answer I am curious :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, there is likely value to instead fix the poorly named getOperatorParens instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erichkeane would you know why its getting wrapped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erichkeane well that's a fun case -- I think you can use decls_begin() and decls_end() to do this:

static const CXXMethodDecl *getOperatorParens(const CXXRecordDecl *Rec) {
  auto MethIter = llvm::find_if(Rec->methods(), [](const CXXMethodDecl *MD) {
    return MD->getOverloadedOperator() == OO_Call;
  });
  if (MethIter != Rec->methods_end())
   return *MethIter;

  using function_template_iterator = specific_decl_iterator<FunctionTemplateDecl>;
  using function_template_range = llvm::iterator_range<specific_decl_iterator<FunctionTemplateDecl>>;
  auto FTIter = llvm::find_if(function_template_range(function_template_iterator(Rec->decls_begin()),
                                                      function_template_iterator(Rec->decls_end())),
                                           [](const FunctionTemplateDecl *FTD) {
                                             if (const auto *MD = dyn_cast<CXXMethodDecl>(FTD->getTemplatedDecl()))
                                               return MD->getOverloadedOperator() == OO_Call;
                                             return false;
                                           });
  if (FTIter != function_template_iterator(Rec->decls_end()))
   return *FTIter;
  return nullptr;
}

(This is totally untested and was written in a web browser, so YMMV.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erichkeane would you know why its getting wrapped?

Because we ALWAYS wrap templates. We need something in the AST to contain BOTH the uninstantiated and instantiated template.

We should probably update the getOperatorParens, but, IMO, it should probably return an array of valid (non-dependent) operators. I'm guessing this whole idea of using getOperatorParens is flawed though. Trying to figure out the call operator of the type rather than checking the callgraph is likely the wrong way about it. @kbobrovs : Please take a look at how you're using getOperatorParens and make sure it would work in a case where there is multiple operator()s, including instantiated template versions.

IN the case of what @AaronBallman just posted (thanks btw!), I don't know if that works for 2 reasons: First, I think the return *FTIter; line is wrong, since that returns a FunctionTemplateDecl*, instead of a CXXMethodDecl*. Second, it has the same problem of only returning the FIRST operator () :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, that should be return cast<CXXMethodDecl>(*FTIter) instead, and it definitely only finds the first. As for how to handle multiple operator() definitions -- I agree that the function should either return a container of all the operators or take the container as a parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbobrovs : Please take a look at how you're using getOperatorParens and make sure it would work in a case where there is multiple operator()s, including instantiated template versions.

AFAIR, this simplistic approach was based on assumption that the API code is under our control, and we know that there are no other operator '()' calls from the parallel_for_kernel.

cgh.parallel_for<class FOX>(range<1>(1), [=](id<1> I) { this_id<1>(); });

// This kernel calls sycl::this_item
cgh.parallel_for<class BEE>(range<1>(1), [=](auto I) { this_item<1>(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... That means KernelObj is defined differently when parameter type is auto? Do you know why this happens? I'm approving the patch since the answer to this question is probably outside the scope of this patch. However, if you know the answer I am curious :)

@rdeodhar
Copy link
Contributor Author

The SYCL handler.hpp makes a special-case of an integral kernel parameter type, like so:

    using TransformedArgType =
        typename std::conditional<std::is_integral<LambdaArgType>::value &&
                                      Dims == 1,
                                  item<Dims>, LambdaArgType>::type;

Now I don't know why that leads to the behavior we see, but it was not worthwhile to pursue it for this bug fix, because the code that doesn't work as expected is not needed at all.

@elizabethandrews
Copy link
Contributor

The SYCL handler.hpp makes a special-case of an integral kernel parameter type, like so:

    using TransformedArgType =
        typename std::conditional<std::is_integral<LambdaArgType>::value &&
                                      Dims == 1,
                                  item<Dims>, LambdaArgType>::type;

Now I don't know why that leads to the behavior we see, but it was not worthwhile to pursue it for this bug fix, because the code that doesn't work as expected is not needed at all.

Ok. Thanks for explaining

@romanovvlad romanovvlad merged commit 6cfc3ad into intel:sycl Jan 13, 2021
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Jan 19, 2021
* sycl: (378 commits)
  [sycl-post-link][NFC] Extracted the code into a subroutine (intel#3042)
  [SYCL][NFC] Remove commented out code (intel#3029)
  [CODEOWNERS] Fix ownership of DPC++ tools tests (intel#3047)
  [SYCL][NFC] Make tests insensitive to dso_local (intel#3037)
  [SYCL] Fix acquiring a mutex in _pi_context::finalize (intel#3001)
  [SYCL] Fix various compilation warnings in plugins (intel#2979)
  [SYCL][ESIMD] Add simd class conversion ctor and operator (intel#3028)
  [sycl-post-link][NFC] Use range-based for loop. (intel#3033)
  [SYCL][NFC] Fix warning in self-build (intel#3023)
  [NFC] Fix sycl-post-link tests to avoid potential race (intel#3031)
  [SYCL][CUDA] Add missing barrier to collectives (intel#2990)
  [SYCL] Make Intel attributes consistent with clang attributes. (intel#3022)
  [SYCL] Bump SYCL minor version (intel#3026)
  [SYCL][Doc] Added requirement on reference to test PR in commit message (intel#3010)
  [SYCL] Put constant initializer list data in non-generic addr space. (intel#3005)
  [SYCL][L0] Fix memory leak in PiDeviceCache and ZeCommandList (intel#2974)
  [SYCL] Fix detection of free function calls (intel#3003)
  [SYCL][NFC] Clean up the builder_dir argument description (intel#3021)
  [SYCL][ESIMD] Fix LowerESIMD crash on a scalar fptoui LLVM instruction (intel#2699)
  [NFC] Remove redundant call to getMainExecutable() (intel#3018)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants