-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add View of Views debugging tool #267
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some short-term use for a specific View-of-Views debugging tool but I think that extending the scope and detecting any kind of fences in parallel constructs or nested (non-Team) parallel constructs would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Please fix the typo.
CI is failing with
|
Like a "sanitizer" tool with a view of views check? |
I was thinking about not just covering kernels that have
we would print
|
That would produce lower quality error messages |
You could still special case for those particular kernels but my point is that we can diagnose more problems fairly easily. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind have a tool which does not too many things at the same time, so I do not thing we need to expand on the functionality necessarily, however I do believe we need to fix the issue with the tool getting hung up on internal allocations such as Team Scratch stuff.
E.g.:
#include<Kokkos_Core.hpp>
int main(int argc, char* argv[]) {
Kokkos::initialize(argc, argv);
{
Kokkos::parallel_for(Kokkos::TeamPolicy<>(1,Kokkos::AUTO).set_scratch_size(0, Kokkos::PerTeam(1000)),
KOKKOS_LAMBDA(const typename Kokkos::TeamPolicy<>::member_type& team) {
});
}
Kokkos::finalize();
}
will abort in the tool with stuff like this:
allocating "Kokkos::Serial::scratch_mem" within parallel region "Z4mainE3$_0"
Abort trap: 6
or:
deallocating "Kokkos::thread_scratch" within parallel region "Z4mainE3$_0"
Abort trap: 6
Another question is support for the monolithic tools library. This tool does not do that right now, but we I think it probably should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not work with Threads backend and I didn't check the other ones. I would just filter all "Kokkos::"
})); | ||
} | ||
|
||
// TODO initialize in main and split unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to do this, but it seems not so easy. I'll open an issue for it, so we can move the discussion there.
"AllocatesInParallel]For", | ||
Kokkos::RangePolicy<Kokkos::DefaultHostExecutionSpace>(0, 1), | ||
KOKKOS_LAMBDA(int) { | ||
V b("b", 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a Cuda
build, the compiler is complaining about this line that:
warning #20011-D: calling a __host__ function ... is not allowed.
The reason seems to be that we end up in a constructor that isn't marked with Kokkos markup. One option would be to desactivate this part of the test in builds with a device backend. I'm not sure whether there's a better solution to this problem.
ASSERT_DEATH(({ | ||
using V = Kokkos::View<int *>; | ||
Kokkos::View<V **, Kokkos::HostSpace> vov("vo]v", 2, 3); | ||
// ^ included a closing square bracket in the label to try | ||
// to trip the substring extraction | ||
V a("a", 4); | ||
V b("b", 5); | ||
vov(0, 0) = a; | ||
vov(0, 1) = a; | ||
vov(1, 0) = b; | ||
}), | ||
"view of views \"vo]v\" not properly cleared"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that in my OpenMP build, I see this death test hang when I run with more than 1 omp thread. (Christian had noted that team tests sometimes hang, but I'm not sure if it's related.)
Thanks for providing this tool @dalg24 But I'm still stuck and some additional input would be great.
The function in question is (I already removed almost everything in there trying to isolate the issue but the code below is exactly as executed): TaskStatus CalculateFluxes(std::shared_ptr<MeshBlockData<Real>> &rc) {
const int scratch_level = 1;
const int nvar = 3;
size_t scratch_size_in_bytes = parthenon::ScratchPad2D<Real>::shmem_size(nvar, 10);
using Kokkos::parallel_for;
using Kokkos::TeamPolicy;
typedef TeamPolicy<parthenon::DevExecSpace>::member_type member_type;
TeamPolicy<parthenon::DevExecSpace> policy(10, Kokkos::AUTO());
parallel_for(
policy, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });
auto policy2 =
policy.set_scratch_size(scratch_level, Kokkos::PerTeam(scratch_size_in_bytes));
parallel_for(
policy2, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });
return TaskStatus::complete;
} Note that when I change the Any hint on how to further debug or where to look would be great. |
On my cell now but I am almost certain that this is a false positive. I expect that this comes from the scratch memory reallocation and that it needs to be filtered out. |
That'd be good news. Tracking/fixing the labeled view was far easier.
The error shows up when using the CUDA backend (haven't tried HIP yet). |
@pgrete I can't reproduce with #include <Kokkos_Core.hpp>
int main() {
Kokkos::ScopeGuard scope_guard;
const int scratch_level = 1;
const int nvar = 3;
size_t scratch_size_in_bytes = 10;
using Kokkos::parallel_for;
using Kokkos::TeamPolicy;
typedef TeamPolicy<>::member_type member_type;
TeamPolicy<> policy(10, Kokkos::AUTO());
parallel_for(
policy, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });
auto policy2 =
policy.set_scratch_size(scratch_level, Kokkos::PerTeam(scratch_size_in_bytes));
parallel_for(
policy2, KOKKOS_LAMBDA(member_type team_member) { team_member.team_barrier(); });
} Can you confirm that that also works fine for you? If so, would you be able to provide a stand-alone reproducer? |
No description provided.