Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add parity waiting #111

Merged
merged 5 commits into from
Jul 29, 2021
Merged

Add parity waiting #111

merged 5 commits into from
Jul 29, 2021

Conversation

ogiroux
Copy link
Contributor

@ogiroux ogiroux commented Jan 30, 2021

Overview

Add APIs to cuda::barrier that allow for waiting on the parity of the barrier. This is an extension to std::barrier that relies on the internal binary phase of the barrier.

Motivation

  • PTX for SM_80 provides parity waiting for barriers rather than using an arrival token that is held in every thread.
    • This is a significant optimization if users can instead derive parity from loop induction variables or other state.
  • https://nvbugs/3248969

Requirements

  • Add the member function barrier::try_wait_parity
  • Add the member function barrier::wait_parity
  • Add test coverage for barrier::wait_parity

Design

  • Add both front-end wait_parity APIs similarly to wait and try_wait.
    • wait_parity is implemented by polling try_wait_parity with back-off.
    • try_wait_parity queries the parity of the barrier with a parity provided by the user.

Test Plan

  • All currently existing tests should continue to function as expected.
  • Add a test for wait_parity which also provides coverage for try_wait_parity.

Performance Tests

  • There is currently no performance criteria for this feature.

Documentation

Complex Internal Systems

  • The internal representation of std::barrier is able to queried to obtain the parity of the barrier.

Stakeholders

  • Olivier Giroux - Implementer
  • Wesley Maxey - Maintainer

@ogiroux ogiroux requested a review from griwes January 30, 2021 01:13
@brycelelbach brycelelbach added this to the 2.0.0 milestone Feb 11, 2021
@ogiroux
Copy link
Contributor Author

ogiroux commented Mar 4, 2021

I'm going to add a bool try_wait(token) facility to this PR before it is merged in 2.0.0. Basically this will just go from having it as an internal name, to having it as an external name (remove the __).

Discussion from internal developer day pointed out that you can get a good benefit from hiding the latency of the load and/or control the polling backoff yourself. It's already in here, might as well make it usable.

@brycelelbach brycelelbach added enhancement New feature or request. P2: nice to have Desired for release, but not necessary. labels Mar 29, 2021
@brycelelbach brycelelbach modified the milestones: 2.0.0, 2.1.0 Mar 29, 2021
@brycelelbach brycelelbach changed the title Added parity waiting Add parity waiting May 18, 2021
@brycelelbach brycelelbach modified the milestones: 2.1.0, 1.6.0 May 18, 2021
@wmaxey
Copy link
Member

wmaxey commented Jul 23, 2021

@ogiroux Does this require any additional tests you would like to see? I was going to permute a few things, but this would be a good opportunity for any edge cases you can think of.

@wmaxey
Copy link
Member

wmaxey commented Jul 28, 2021

@wmaxey wmaxey added the testing: internal ci passed Passed internal NVIDIA CI (DVS). label Jul 28, 2021
Copy link
Collaborator

@griwes griwes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the phrasing in the documentation.

| [`cuda::device::barrier_native_handle`] | Get the native handle to a `cuda::barrier`. `(function template)` |
| [`cuda::barrier::init`] | Initialize a `cuda::barrier`. `(friend function)` |
| [`cuda::device::barrier_native_handle`] | Get the native handle to a `cuda::barrier`. `(function template)` |
| [`cuda::barrier::wait_parity/try_wait_parity`] | Wait on a `specific` phase of the barrier |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this phrasing. @ogiroux how would you feel about "Wait on a phase whose parity matches the argument." or some such?

Comment on lines 15 to 16
`barrier::wait_parity` stalls execution while the barrier is not at the specified phase.
`barrier::try_wait_parity` queries the the state of the barrier against the specified phase.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the previous comment. It's really weird that the description here doesn't mention parity at all.

@wmaxey
Copy link
Member

wmaxey commented Jul 29, 2021

In the interest of time, I'm going to split the documentation changes into another PR and merge the changes.

@wmaxey wmaxey merged commit 48d213a into main Jul 29, 2021
@wmaxey wmaxey deleted the barrier-parity branch July 29, 2021 18:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request. P2: nice to have Desired for release, but not necessary. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants