Skip to content

Releases: DefTruth/CUDA-Learn-Notes

⚡️⚡️toy-hgemm library

28 Nov 02:07
37f1554
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.6.4...v2.6.5

toy-hgemm library

22 Nov 11:42
56e2fe9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.6.3...v2.6.4

toy-hgemm library

22 Nov 06:51
6ea2eb9
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.6.2...v2.6.3

CuTe HGEMM Block Swizzle

21 Nov 10:07
60d4ad2
Compare
Choose a tag to compare

What's Changed

NVIDIA_L20_NN+TN+v2

Full Changelog: v2.6.1...v2.6.2

v2.6.1 CuTe HGEMM

19 Nov 08:41
4e45d31
Compare
Choose a tag to compare

What's Changed

NVIDIA_L20_NN+TN
NVIDIA_GeForce_RTX_4090_NN+TN

New Contributors

Full Changelog: v2.6...v2.6.1

v2.6 Refactor 7/N

14 Nov 11:50
d53ab23
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.5...v2.6

v2.5

05 Nov 02:41
a66cc2f
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.4.18...v2.5

v2.4.18

01 Nov 01:20
28c12bd
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.4.17...v2.4.18

v2.4.17

29 Oct 06:39
a65f1f6
Compare
Choose a tag to compare

What's Changed

Full Changelog: v2.4.16...v2.4.17

HGEMM Warp Swizzle/Reg Buffers

25 Oct 05:59
6c89595
Compare
Choose a tag to compare

What's Changed

  • [HGEMM] HGEMM MMA with Reg Double Buffers by @DefTruth in #99
  • [HGEMM] ldmatrix.x4.trans with reg double buffers by @DefTruth in #100
  • [HGEMM] collective store via warp shfl&reg reuse by @DefTruth in #101

Full Changelog: v2.4.15...v2.4.16