[SYCL][CUDA] Add missing barrier to collectives #2990

Pennycook · 2021-01-04T21:22:34Z

SYCL sub-group and group functions should act as synchronization points.
Group collectives need a barrier at the end to ensure that back-to-back
collectives do not lead to a race condition.

Note that the barrier at the beginning of each collective occurs after
each work-item writes its partial results to the scratch space. This is
assumed safe because only the collective functions can access the space,
and collective functions must be encountered in uniform control flow; any
work-item encountering a collective function can assume it is safe to use
the scratch space, because all work-items in the same work-group must have
either executed no collective functions or the barrier at the end of a previous
collective function.

Signed-off-by: John Pennycook [email protected]

SYCL sub-group and group functions should act as synchronization points. Group collectives need a barrier at the end to ensure that back-to-back collectives do not lead to a race condition. Note that the barrier at the beginning of each collective occurs after each work-item writes its partial results to the scratch space. This is assumed safe because only the collective functions can access the space, and collective functions must be encountered in uniform control flow; any work-item encountering a collective function can assume it is safe to use the scratch space, because all work-items in the same work-group must have either executed no collective functions or the barrier at the end of a previous collective function. Signed-off-by: John Pennycook <[email protected]>

anton-v-gorshkov

Looks fine for me.

bader

Is there a regression test covering this code?

Pennycook · 2021-01-11T17:47:02Z

Is there a regression test covering this code?

There isn't, and I've struggled to write one. Because it's a race condition, the problem wouldn't show up on every run of the test. I'm also having trouble reproducing the failure at all with a simple test-case (with two back-to-back reductions) on the NVIDIA GPU that I use for testing, suggesting it might be quite hard to trigger in practice.

bader · 2021-01-11T18:26:38Z

Is there a regression test covering this code?

There isn't, and I've struggled to write one. Because it's a race condition, the problem wouldn't show up on every run of the test. I'm also having trouble reproducing the failure at all with a simple test-case (with two back-to-back reductions) on the NVIDIA GPU that I use for testing, suggesting it might be quite hard to trigger in practice.

I realize the it reproducer might be not 100% reliable, but a simple test-case sounds better than nothing. I see value in adding such test case as I suppose there is a chance it should be able to catch issues in collectives considering that we run regression tests multiple times. In addition to that it will catch issues not related to data races.
What do you think about it?

Calls reduce, exclusive scan and inclusive scan multiple times back-to-back. Note that since we are testing for a race condition, it is possible for this test to pass even with an incorrect implementation. Signed-off-by: John Pennycook <[email protected]>

Pennycook · 2021-01-11T19:03:38Z

What do you think about it?

Makes sense. I've added a simple regression test in 14438de.

bader

LGTM except one minor comment.
Thanks!

bader · 2021-01-11T19:07:42Z

sycl/test/regression/back_to_back_collectives.cpp

+// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
+// RUN: %HOST_RUN_PLACEHOLDER %t.out
+// RUN: %CPU_RUN_PLACEHOLDER %t.out
+// RUN: %GPU_RUN_PLACEHOLDER %t.out


This should be located inside sycl/test/on-device directory according to the guidelines from Get Started Guide.

Oops. I'll go and read the new guidelines.

It seems the test should be even moved to https://github.com/intel/llvm-test-suite

Signed-off-by: John Pennycook <[email protected]>

sycl/test/on-device/back_to_back_collectives.cpp

Device query may return a value too large for a specific kernel; kernel query is required in order to respect local memory usage. Signed-off-by: John Pennycook <[email protected]>

* sycl: (378 commits) [sycl-post-link][NFC] Extracted the code into a subroutine (intel#3042) [SYCL][NFC] Remove commented out code (intel#3029) [CODEOWNERS] Fix ownership of DPC++ tools tests (intel#3047) [SYCL][NFC] Make tests insensitive to dso_local (intel#3037) [SYCL] Fix acquiring a mutex in _pi_context::finalize (intel#3001) [SYCL] Fix various compilation warnings in plugins (intel#2979) [SYCL][ESIMD] Add simd class conversion ctor and operator (intel#3028) [sycl-post-link][NFC] Use range-based for loop. (intel#3033) [SYCL][NFC] Fix warning in self-build (intel#3023) [NFC] Fix sycl-post-link tests to avoid potential race (intel#3031) [SYCL][CUDA] Add missing barrier to collectives (intel#2990) [SYCL] Make Intel attributes consistent with clang attributes. (intel#3022) [SYCL] Bump SYCL minor version (intel#3026) [SYCL][Doc] Added requirement on reference to test PR in commit message (intel#3010) [SYCL] Put constant initializer list data in non-generic addr space. (intel#3005) [SYCL][L0] Fix memory leak in PiDeviceCache and ZeCommandList (intel#2974) [SYCL] Fix detection of free function calls (intel#3003) [SYCL][NFC] Clean up the builder_dir argument description (intel#3021) [SYCL][ESIMD] Fix LowerESIMD crash on a scalar fptoui LLVM instruction (intel#2699) [NFC] Remove redundant call to getMainExecutable() (intel#3018) ...

Pennycook added bug Something isn't working cuda CUDA back-end labels Jan 4, 2021

Pennycook requested a review from anton-v-gorshkov January 4, 2021 21:22

Pennycook requested a review from bader as a code owner January 4, 2021 21:22

anton-v-gorshkov previously approved these changes Jan 11, 2021

View reviewed changes

bader previously approved these changes Jan 11, 2021

View reviewed changes

Pennycook dismissed stale reviews from bader and anton-v-gorshkov via 14438de January 11, 2021 19:02

Pennycook requested a review from a team as a code owner January 11, 2021 19:02

Pennycook requested a review from alexbatashev January 11, 2021 19:02

bader previously approved these changes Jan 11, 2021

View reviewed changes

[SYCL][NFC] Move test from regression to on-device

d7145ab

Signed-off-by: John Pennycook <[email protected]>

Pennycook dismissed bader’s stale review via d7145ab January 11, 2021 19:18

bader previously approved these changes Jan 11, 2021

View reviewed changes

alexbatashev previously approved these changes Jan 12, 2021

View reviewed changes

bader reviewed Jan 13, 2021

View reviewed changes

sycl/test/on-device/back_to_back_collectives.cpp Show resolved Hide resolved

Update sycl/test/on-device/back_to_back_collectives.cpp

73a8788

bader dismissed stale reviews from alexbatashev and themself via 73a8788 January 13, 2021 09:13

bader reviewed Jan 13, 2021

View reviewed changes

sycl/test/on-device/back_to_back_collectives.cpp Outdated Show resolved Hide resolved

Update sycl/test/on-device/back_to_back_collectives.cpp

036e712

bader reviewed Jan 13, 2021

View reviewed changes

sycl/test/on-device/back_to_back_collectives.cpp Outdated Show resolved Hide resolved

[SYCL] Derive max WG size from kernel query

6d9918b

Device query may return a value too large for a specific kernel; kernel query is required in order to respect local memory usage. Signed-off-by: John Pennycook <[email protected]>

alexbatashev approved these changes Jan 13, 2021

View reviewed changes

bader approved these changes Jan 13, 2021

View reviewed changes

romanovvlad merged commit 2b6f2cd into intel:sycl Jan 14, 2021

Pennycook deleted the cuda-collective-barriers branch January 28, 2021 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Add missing barrier to collectives #2990

[SYCL][CUDA] Add missing barrier to collectives #2990

Pennycook commented Jan 4, 2021

anton-v-gorshkov left a comment

bader left a comment

Pennycook commented Jan 11, 2021

bader commented Jan 11, 2021

Pennycook commented Jan 11, 2021

bader left a comment

bader Jan 11, 2021

Pennycook Jan 11, 2021

romanovvlad Jan 12, 2021 •

edited

Loading

[SYCL][CUDA] Add missing barrier to collectives #2990

[SYCL][CUDA] Add missing barrier to collectives #2990

Conversation

Pennycook commented Jan 4, 2021

anton-v-gorshkov left a comment

Choose a reason for hiding this comment

bader left a comment

Choose a reason for hiding this comment

Pennycook commented Jan 11, 2021

bader commented Jan 11, 2021

Pennycook commented Jan 11, 2021

bader left a comment

Choose a reason for hiding this comment

bader Jan 11, 2021

Choose a reason for hiding this comment

Pennycook Jan 11, 2021

Choose a reason for hiding this comment

romanovvlad Jan 12, 2021 • edited Loading

Choose a reason for hiding this comment

romanovvlad Jan 12, 2021 •

edited

Loading