Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

DefTruth / CUDA-Learn-Notes Public

Notifications You must be signed in to change notification settings
Fork 166
Star 1.6k

Code
Issues 2
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: DefTruth/CUDA-Learn-Notes

Releases · DefTruth/CUDA-Learn-Notes

⚡️⚡️toy-hgemm library

28 Nov 02:07

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

⚡️⚡️toy-hgemm library Latest

Latest

What's Changed

[HGEMM] Update RTX 3080 Laptop perf by @DefTruth in #148
[HGEMM] Update toy-hgemm library 0.1.0 by @DefTruth in #149
[HGEMM] Update toy-hgemm library 0.1.0 by @DefTruth in #150
[HGEMM] Update toy-hgemm library 0.1.0 by @DefTruth in #152

Full Changelog: v2.6.4...v2.6.5

Contributors

DefTruth

Assets 2

Loading

All reactions

toy-hgemm library

22 Nov 11:42

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

toy-hgemm library

What's Changed

[HGEMM] Release toy-hgemm library 0.1.0 by @DefTruth in #146

Full Changelog: v2.6.3...v2.6.4

Contributors

DefTruth

Assets 2

Loading

All reactions

toy-hgemm library

22 Nov 06:51

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

toy-hgemm library

What's Changed

[HGEMM] Release toy-hgemm library 0.1.0 by @DefTruth in #145

Full Changelog: v2.6.2...v2.6.3

Contributors

DefTruth

Assets 2

Loading

All reactions

CuTe HGEMM Block Swizzle

21 Nov 10:07

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

CuTe HGEMM Block Swizzle

What's Changed

[HGEMM] trans mat b from row major -> col major by @DefTruth in #135
[HGEMM] refactor HGEMM cpp benchmark by @DefTruth in #136
[HGEMM] Update HGEMM L20/4090 Bench by @DefTruth in #137
[HGEMM] fix cublas hgemm handle error by @DefTruth in #138
[HGEMM] Add MMA HGEMM NN C++ benchmark by @DefTruth in #139
[HGEMM] CuTe HGEMM with Thread Block Swizzle by @DefTruth in #140
[HGEMM] clear tensor cache avoid OOM by @DefTruth in #141
[HGEMM] Add gc.collect to HGEMM bench script by @DefTruth in #142
[HGEMM] Add show_memory option to bench by @DefTruth in #143
[HGEMM] manually init/destroy cublas handle by @DefTruth in #144

Full Changelog: v2.6.1...v2.6.2

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.6.1 CuTe HGEMM

19 Nov 08:41

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.6.1 CuTe HGEMM

What's Changed

[HGEMM] Add large MNK block swizzle policy by @DefTruth in #132
[HGEMM] Add CuTe HGEMM with SMEM Swizzle by @DefTruth in #134
Update embedding.cu by @TheManWhoIsStupid in #133

New Contributors

@TheManWhoIsStupid made their first contribution in #133

Full Changelog: v2.6...v2.6.1

Contributors

DefTruth and TheManWhoIsStupid

Assets 2

Loading

DefTruth reacted with rocket emoji

All reactions

🚀 1 reaction

1 person reacted

v2.6 Refactor 7/N

14 Nov 11:50

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.6 Refactor 7/N

What's Changed

[HGEMM] Update NVIDIA L20/4090 Perf plots by @DefTruth in #126
[Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP by @DefTruth in #127
[README] Add contents lists by @DefTruth in #128
[README] Update README by @DefTruth in #129
[README] Update README.md by @DefTruth in #130
Bump up to v2.6 by @DefTruth in #131

Full Changelog: v2.5...v2.6

Contributors

DefTruth

Assets 2

Loading

wangzijian1010 and DefTruth reacted with hooray emoji

All reactions

🎉 2 reactions

2 people reacted

v2.5

05 Nov 02:41

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.5

What's Changed

[HGEMM] Update HGEMM README.md by @DefTruth in #120
[HGEMM] Add plot tflops function by @DefTruth in #121
[HGEMM] Add NVIDIA RTX 3090 Laptop perf plot by @DefTruth in #122
[PERF] Update HGEMM benchmark scripts by @DefTruth in #123
[HGEMM] Add HGEMM L20/4090 benchmark figures by @DefTruth in #124
Bump up to v2.5 by @DefTruth in #125

Full Changelog: v2.4.18...v2.5

Contributors

DefTruth

Assets 2

Loading

xq25478, DefTruth, and wangzijian1010 reacted with thumbs up emoji

All reactions

👍 3 reactions

3 people reacted

v2.4.18

01 Nov 01:20

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.18

What's Changed

Update README.md by @DefTruth in #115
[HGEMM] Update HGEMM Supported Matrix by @DefTruth in #116
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #117
[README] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #118
[HGEMM] Add NVIDIA RTX 4090 benchmark by @DefTruth in #119

Full Changelog: v2.4.17...v2.4.18

Contributors

DefTruth

Assets 2

Loading

All reactions

v2.4.17

29 Oct 06:39

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.4.17

What's Changed

[NMS] Add nms f32 cuda kernel. by @bear-zd in #102
[HGEMM] Add some note to collective store by @DefTruth in #103
[HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in #104
[HGEMM] Update HGEMM benchmark scripts by @DefTruth in #105
[HGEMM] Add Warp Swizzle as template param by @DefTruth in #106
[HGEMM] add -Xptxas -v compile flag by @DefTruth in #107
[HGEMM] Try reduce registers usage by @DefTruth in #108
[HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in #109
[HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in #110
[HGEMM] Add M=N=K option for benchmark by @DefTruth in #111
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #112
[README] Update HGEMM/SGEMM Supported matrix by @DefTruth in #113
[Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in #114

Full Changelog: v2.4.16...v2.4.17

Contributors

DefTruth and bear-zd

Assets 2

Loading

All reactions

HGEMM Warp Swizzle/Reg Buffers

25 Oct 05:59

DefTruth

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

HGEMM Warp Swizzle/Reg Buffers

What's Changed

[HGEMM] HGEMM MMA with Reg Double Buffers by @DefTruth in #99
[HGEMM] ldmatrix.x4.trans with reg double buffers by @DefTruth in #100
[HGEMM] collective store via warp shfl&reg reuse by @DefTruth in #101

Full Changelog: v2.4.15...v2.4.16

Contributors

DefTruth

Assets 2

Loading

wangzijian1010 and DefTruth reacted with rocket emoji

All reactions

🚀 2 reactions

2 people reacted

Previous 1 2 3 4 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.