Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add View of Views debugging tool #267

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

dalg24
Copy link
Member

@dalg24 dalg24 commented Aug 6, 2024

No description provided.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some short-term use for a specific View-of-Views debugging tool but I think that extending the scope and detecting any kind of fences in parallel constructs or nested (non-Team) parallel constructs would be a good idea.

tests/vov-bug-finder/test_view_of_views_bug_finder.cpp Outdated Show resolved Hide resolved
Copy link

@diehlpk diehlpk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please fix the typo.

@masterleinad
Copy link
Contributor

CI is failing with

/__w/kokkos-tools/kokkos-tools/tests/vov-bug-finder/test_view_of_views_bug_finder.cpp(67): error: The enclosing parent function ("TestBody") for an extended __host__ __device__ lambda cannot have private or protected access within its class
     __attribute__((host)) __attribute__((device))(int) { new (&vov(0, 0)) V(a); new (&vov(0, 1)) V(a); new (&vov(1, 0)) V(b); }); })
                                                  ^

/__w/kokkos-tools/kokkos-tools/tests/vov-bug-finder/test_view_of_views_bug_finder.cpp(82): error: The enclosing parent function ("TestBody") for an extended __host__ __device__ lambda cannot have private or protected access within its class
     __attribute__((host)) __attribute__((device))(int) { V b("b", 5); new (&vov(1, 0)) V(b); }); })
                                                  ^

@dalg24
Copy link
Member Author

dalg24 commented Aug 6, 2024

I see some short-term use for a specific View-of-Views debugging tool but I think that extending the scope and detecting any kind of fences in parallel constructs or nested (non-Team) parallel constructs would be a good idea.

Like a "sanitizer" tool with a view of views check?

@masterleinad
Copy link
Contributor

Like a "sanitizer" tool with a view of views check?

I was thinking about not just covering kernels that have Kokkos::View::initialization in their label but always diagnose if the stack isn't empty in begin_parallel_*. I don't think the error message would be less helpful for the special case of Views of VIews. Instead of

constructing view "my_view" within a parallel region "my_region"

we would print

calling parallel_for "Kokkos::View::initialization [my_view]" within a parallel region "my_region"

@dalg24
Copy link
Member Author

dalg24 commented Aug 6, 2024

That would produce lower quality error messages

@masterleinad
Copy link
Contributor

That would produce lower quality error messages

You could still special case for those particular kernels but my point is that we can diagnose more problems fairly easily.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind have a tool which does not too many things at the same time, so I do not thing we need to expand on the functionality necessarily, however I do believe we need to fix the issue with the tool getting hung up on internal allocations such as Team Scratch stuff.

E.g.:

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc, argv);
  {
    Kokkos::parallel_for(Kokkos::TeamPolicy<>(1,Kokkos::AUTO).set_scratch_size(0, Kokkos::PerTeam(1000)),
      KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<>::member_type& team) {
    });
  }
  Kokkos::finalize();
}

will abort in the tool with stuff like this:

allocating "Kokkos::Serial::scratch_mem" within parallel region "Z4mainE3$_0"
Abort trap: 6

or:

deallocating "Kokkos::thread_scratch" within parallel region "Z4mainE3$_0"
Abort trap: 6

Another question is support for the monolithic tools library. This tool does not do that right now, but we I think it probably should.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work with Threads backend and I didn't check the other ones. I would just filter all "Kokkos::"

debugging/vov-bug-finder/kp_view_of_views_bug_finder.cpp Outdated Show resolved Hide resolved
}));
}

// TODO initialize in main and split unit tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to do this, but it seems not so easy. I'll open an issue for it, so we can move the discussion there.

"AllocatesInParallel]For",
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 1),
KOKKOS_LAMBDA(int) {
V b("b", 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a Cuda build, the compiler is complaining about this line that:

warning #20011-D: calling a __host__  function ... is not allowed.

The reason seems to be that we end up in a constructor that isn't marked with Kokkos markup. One option would be to desactivate this part of the test in builds with a device backend. I'm not sure whether there's a better solution to this problem.

Comment on lines +107 to +118
ASSERT_DEATH(({
using V = Kokkos::View<int *>;
Kokkos::View<V **, Kokkos::HostSpace> vov("vo]v", 2, 3);
// ^ included a closing square bracket in the label to try
// to trip the substring extraction
V a("a", 4);
V b("b", 5);
vov(0, 0) = a;
vov(0, 1) = a;
vov(1, 0) = b;
}),
"view of views \"vo]v\" not properly cleared");
Copy link
Contributor

@maartenarnst maartenarnst Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that in my OpenMP build, I see this death test hang when I run with more than 1 omp thread. (Christian had noted that team tests sometimes hang, but I'm not sure if it's related.)

@pgrete
Copy link

pgrete commented Nov 7, 2024

Thanks for providing this tool @dalg24
It has been quite helpful in tracking down some misuse of View of View on our side.

But I'm still stuck and some additional input would be great.
Using the tool I get a

deallocating "[unlabeled]" within parallel region "18__nv_hdl_wrapper_tILb0ELb0ELb0E11__nv_dl_tagIPFN9parthenon10TaskStatusERSt10shared_ptrINS1_13MeshBlockDataIdEEEEXadL_ZN24sparse_advection_package15CalculateFluxesES7_EELj2EEFvN6Kokkos4Impl14CudaTeamMemberEEJEE"

The function in question is (I already removed almost everything in there trying to isolate the issue but the code below is exactly as executed):

TaskStatus CalculateFluxes(std::shared_ptr<MeshBlockData<Real>> &rc) {
  const int scratch_level = 1;
  const int nvar = 3;
  size_t scratch_size_in_bytes = parthenon::ScratchPad2D<Real>::shmem_size(nvar, 10);

  using Kokkos::parallel_for;
  using Kokkos::TeamPolicy;

  typedef TeamPolicy<parthenon::DevExecSpace>::member_type member_type;
  TeamPolicy<parthenon::DevExecSpace> policy(10, Kokkos::AUTO());
  parallel_for(
      policy, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });

  auto policy2 =
      policy.set_scratch_size(scratch_level, Kokkos::PerTeam(scratch_size_in_bytes));
  parallel_for(
      policy2, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });
  return TaskStatus::complete;
}

Note that when I change the policy2 to auto policy2 = policy then I don't get the deallaction error.
What's going on here?
Where does the error come from? I somehow assume that the [unlabelded] View is from somewhere else in the code but I don't understand why it's triggered/deallocated there (and only if I set the scratch size).
I also tried to create a MWE using the code above in a separate binary but I'm not able to reproduce (which is also what leads me to believe that the [unlabelded] View comes from somewhere else.

Any hint on how to further debug or where to look would be great.

@dalg24
Copy link
Member Author

dalg24 commented Nov 7, 2024

On my cell now but I am almost certain that this is a false positive. I expect that this comes from the scratch memory reallocation and that it needs to be filtered out.
What backend were you using?

@pgrete
Copy link

pgrete commented Nov 7, 2024

On my cell now but I am almost certain that this is a false positive. I expect that this comes from the scratch memory reallocation and that it needs to be filtered out.

That'd be good news. Tracking/fixing the labeled view was far easier.

What backend were you using?

The error shows up when using the CUDA backend (haven't tried HIP yet).

@masterleinad
Copy link
Contributor

@pgrete I can't reproduce with

#include <Kokkos_Core.hpp>

int main() {
        Kokkos::ScopeGuard scope_guard;
  const int scratch_level = 1;
  const int nvar = 3;
  size_t scratch_size_in_bytes = 10;

  using Kokkos::parallel_for;
  using Kokkos::TeamPolicy;

  typedef TeamPolicy<>::member_type member_type;
  TeamPolicy<> policy(10, Kokkos::AUTO());
  parallel_for(
      policy, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });

  auto policy2 =
      policy.set_scratch_size(scratch_level, Kokkos::PerTeam(scratch_size_in_bytes));
  parallel_for(
      policy2, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });
}

Can you confirm that that also works fine for you? If so, would you be able to provide a stand-alone reproducer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants