Releases: DefTruth/CUDA-Learn-Notes
Releases · DefTruth/CUDA-Learn-Notes
⚡️⚡️toy-hgemm library
toy-hgemm library
What's Changed
Full Changelog: v2.6.3...v2.6.4
toy-hgemm library
What's Changed
Full Changelog: v2.6.2...v2.6.3
CuTe HGEMM Block Swizzle
What's Changed
- [HGEMM] trans mat b from row major -> col major by @DefTruth in #135
- [HGEMM] refactor HGEMM cpp benchmark by @DefTruth in #136
- [HGEMM] Update HGEMM L20/4090 Bench by @DefTruth in #137
- [HGEMM] fix cublas hgemm handle error by @DefTruth in #138
- [HGEMM] Add MMA HGEMM NN C++ benchmark by @DefTruth in #139
- [HGEMM] CuTe HGEMM with Thread Block Swizzle by @DefTruth in #140
- [HGEMM] clear tensor cache avoid OOM by @DefTruth in #141
- [HGEMM] Add gc.collect to HGEMM bench script by @DefTruth in #142
- [HGEMM] Add show_memory option to bench by @DefTruth in #143
- [HGEMM] manually init/destroy cublas handle by @DefTruth in #144
Full Changelog: v2.6.1...v2.6.2
v2.6.1 CuTe HGEMM
What's Changed
- [HGEMM] Add large MNK block swizzle policy by @DefTruth in #132
- [HGEMM] Add CuTe HGEMM with SMEM Swizzle by @DefTruth in #134
- Update embedding.cu by @TheManWhoIsStupid in #133
New Contributors
- @TheManWhoIsStupid made their first contribution in #133
Full Changelog: v2.6...v2.6.1
v2.6 Refactor 7/N
What's Changed
- [HGEMM] Update NVIDIA L20/4090 Perf plots by @DefTruth in #126
- [Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP by @DefTruth in #127
- [README] Add contents lists by @DefTruth in #128
- [README] Update README by @DefTruth in #129
- [README] Update README.md by @DefTruth in #130
- Bump up to v2.6 by @DefTruth in #131
Full Changelog: v2.5...v2.6
v2.5
What's Changed
- [HGEMM] Update HGEMM README.md by @DefTruth in #120
- [HGEMM] Add plot tflops function by @DefTruth in #121
- [HGEMM] Add NVIDIA RTX 3090 Laptop perf plot by @DefTruth in #122
- [PERF] Update HGEMM benchmark scripts by @DefTruth in #123
- [HGEMM] Add HGEMM L20/4090 benchmark figures by @DefTruth in #124
- Bump up to v2.5 by @DefTruth in #125
Full Changelog: v2.4.18...v2.5
v2.4.18
What's Changed
- Update README.md by @DefTruth in #115
- [HGEMM] Update HGEMM Supported Matrix by @DefTruth in #116
- [HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #117
- [README] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #118
- [HGEMM] Add NVIDIA RTX 4090 benchmark by @DefTruth in #119
Full Changelog: v2.4.17...v2.4.18
v2.4.17
What's Changed
- [NMS] Add nms f32 cuda kernel. by @bear-zd in #102
- [HGEMM] Add some note to collective store by @DefTruth in #103
- [HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in #104
- [HGEMM] Update HGEMM benchmark scripts by @DefTruth in #105
- [HGEMM] Add Warp Swizzle as template param by @DefTruth in #106
- [HGEMM] add -Xptxas -v compile flag by @DefTruth in #107
- [HGEMM] Try reduce registers usage by @DefTruth in #108
- [HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in #109
- [HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in #110
- [HGEMM] Add M=N=K option for benchmark by @DefTruth in #111
- [HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #112
- [README] Update HGEMM/SGEMM Supported matrix by @DefTruth in #113
- [Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #114
Full Changelog: v2.4.16...v2.4.17