-
Notifications
You must be signed in to change notification settings - Fork 313
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[TE/JAX] XLA Custom Calls with FFI for FusedAttnFwd, Quantize, Transpose, and ActLuFP8
#1263
opened Oct 16, 2024 by
huanghua1994
Loading…
7 of 13 tasks
[PyTorch] Fix wgrads for GroupedLinear when weights don't require grad
#1258
opened Oct 16, 2024 by
yaox12
Loading…
13 tasks
[PyTorch] Reorganize L1 tests
testing
Improvements to tests or testing infrastructure
#1255
opened Oct 15, 2024 by
timmoon10
Loading…
5 of 14 tasks
Draft: reduce cudagraph mem via preoallcations
#1253
opened Oct 15, 2024 by
JimmyZhang12
Loading…
13 tasks
[Bugfix] Fix bias for 0-dim tensors in gemm
#1246
opened Oct 12, 2024 by
yaox12
Loading…
1 of 13 tasks
Save CUDA Graph memory by reusing input and output tensors
#1234
opened Oct 9, 2024 by
buptzyb
Loading…
5 of 13 tasks
Fused Attention Support 64-bit Ragged Offsets for Large THD Tensors
#1230
opened Oct 8, 2024 by
mgoldfarb-nvidia
Loading…
8 of 13 tasks
[TE/JAX] Enabling CudaGraph for custom calls with FFI
jax
#1228
opened Oct 7, 2024 by
phu0ngng
Loading…
4 of 13 tasks
Draft: Use fused push_send_recv kernel for TP AG and RS overlaps
#1200
opened Sep 24, 2024 by
erhoo82
Loading…
13 tasks
[PyTorch] Fused dbias-cast-transpose in bias operation
#1168
opened Sep 6, 2024 by
timmoon10
Loading…
7 of 13 tasks
[PyTorch/C] Exposed Userbuffers configuration option to control comm and compute stream priorities
enhancement
New feature or request
#1149
opened Aug 29, 2024 by
denera
Loading…
8 of 13 tasks
[PyTorch] Avoid saving fp8_tensors in certain scenarios
#1143
opened Aug 28, 2024 by
cyanguwa
Loading…
8 of 13 tasks
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.