Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental Explicit Stream Annotation #17982

Conversation

chaserileyroberts
Copy link
Contributor

@chaserileyroberts chaserileyroberts commented Oct 7, 2024

This is the first PR that is intended to support explicit stream annotations for GPU runtimes.

Why do we want/need this?

There are a few optimizations that are possible with existing hardware, but they are difficult to generate from XLA. The intention is that by allowing explicit annotation of what stream a subcomputation should run with, we can allow users to define their own stream assignment strategies

  1. Communication-Communication overlap

Certain configurations of parallelization that would allow for overlapping all-gather and all-reduce operations on independent networking hardware (i.e., one operation is running on NVLinks and the other exclusively using IB). Right now there is no way to do this explicitly in JAX.

  1. Compute-Compute overlap

Certain kernels do not utilize all of the SMs for the full duration of their computation, leaving some idling. Other independent compute kernels could utilize these SMs for better e2e performance. We do this already for some of our collective matmul implementations, but there is currently no way to do this explicitly in JAX.

What are the code changes?

  • Added new experimental flag xla_gpu_experimental_stream_annotation
  • Adding an exception to CallInliner.
  • Set operation_queue_id and add wrapped async operations
  • At runtime, the execution stream assignment will use the stream annotation and queue kernels accordingly.

Current known limitations

  • Non-inlined calls aren't well supported by all passes. For example, a single Add will not correctly lower to a fusion.

@chaserileyroberts chaserileyroberts force-pushed the chase/stream_annotation branch 3 times, most recently from fa650cd to 018ed51 Compare October 15, 2024 01:08
@chaserileyroberts chaserileyroberts marked this pull request as ready for review October 15, 2024 01:14
@chaserileyroberts chaserileyroberts changed the title [DRAFT] Stream Annotation Prototype Experimental Explicit Stream Annotation Oct 15, 2024
@golechwierowicz
Copy link
Member

Can you split the PR into:

  1. Adding a new experimental flag + exception to call inliner.
  2. Changes to stream_attribute_annotator + test.
  3. Rest of the changes (execution stream assignment)

@chaserileyroberts
Copy link
Contributor Author

Yes I will do that.

copybara-service bot pushed a commit that referenced this pull request Oct 29, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 1, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 6, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 7, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 7, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 7, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 7, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 7, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 8, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

FUTURE_COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 691070675
copybara-service bot pushed a commit that referenced this pull request Nov 8, 2024
Imported from GitHub PR #18448

First part of splitting up #17982
Copybara import of the project:

--
8ecc06c by chaser <[email protected]>:

Don't inline stream annotated kCalls

Merging this change closes #18448

COPYBARA_INTEGRATE_REVIEW=#18448 from chaserileyroberts:chase/stream_call_noinline 8ecc06c
PiperOrigin-RevId: 694522066
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants