cuda : do not use batched GEMM when tensor cores are not available #3882

ggerganov · 2023-11-01T09:54:31Z

askmyteapot · 2023-11-01T11:49:50Z

Can confirm the fix works for Pascal SM6.1

cebtenzzre · 2023-11-02T05:03:00Z

I can confirm that this brings pp512 on my Tesla P40 back to pre-#3749 speeds.

Now both #3749 and #3776 can be worked around via -DLLAMA_CUDA_FORCE_MMQ=ON on older cards.

cuda : check if this fixes Pascal card regression

22cc9be

young-developer mentioned this pull request Nov 1, 2023

Windows - CUDA GPU - Performance Difference - 1429 vs 1430+ #3884

Closed

young-developer approved these changes Nov 1, 2023

View reviewed changes

ggerganov added performance Speed related topics Nvidia GPU Issues specific to Nvidia GPUs labels Nov 1, 2023

cebtenzzre approved these changes Nov 2, 2023

View reviewed changes

ggerganov merged commit 4d719a6 into master Nov 2, 2023
33 checks passed

ggerganov deleted the try-fix-3869 branch November 2, 2023 06:35

Ph0rk0z mentioned this pull request Nov 5, 2023

cuda : speed-up by using CUBLAS_COMPUTE_32F instead of CUBLAS_COMPUTE_16F #3816

Closed

olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023

cuda : check if this fixes Pascal card regression (ggerganov#3882)

a660fc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda : do not use batched GEMM when tensor cores are not available #3882

cuda : do not use batched GEMM when tensor cores are not available #3882

ggerganov commented Nov 1, 2023

askmyteapot commented Nov 1, 2023

cebtenzzre commented Nov 2, 2023 •

edited

Loading

cuda : do not use batched GEMM when tensor cores are not available #3882

cuda : do not use batched GEMM when tensor cores are not available #3882

Conversation

ggerganov commented Nov 1, 2023

askmyteapot commented Nov 1, 2023

cebtenzzre commented Nov 2, 2023 • edited Loading

cebtenzzre commented Nov 2, 2023 •

edited

Loading