-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA cannot generate images #95
Comments
make sure everything is the same, you can check the hashes of the model eg. |
I think it could be an issue related to the Tensor Cores. Try uncommenting line 99 of #define GGML_CUDA_MAX_NODES 8192
// define this if you want to always fallback to MMQ kernels and not use cuBLAS for matrix multiplication
// on modern hardware, using cuBLAS is recommended as it utilizes F16 tensor cores which are very performant
// for large computational tasks. the drawback is that this requires some extra amount of VRAM:
// - 7B quantum model: +100-200 MB
// - 13B quantum model: +200-400 MB
//
#define GGML_CUDA_FORCE_MMQ // decomment this line and try again
// TODO: improve this to be correct for more hardware
// for example, currently fails for GeForce GTX 1660 which is TURING arch (> VOLTA) but does not have tensor cores
// probably other such cases, and not sure what happens on AMD hardware
#if !defined(GGML_CUDA_FORCE_MMQ)
#define CUDA_USE_TENSOR_CORES
#endif
|
The running speed is much faster, but the generated image is still pure green.
|
Try another model and a different prompt; attempt to generate with CPU if it provides a coherent image. I can't think of any ideas with the limited information I have. |
The program and model are stored in my mobile hard disk. I tried f32, f16 and changed sample_method=LCM and got the same output image. |
Could you compare the clip outputs (hidden state) of get_learned_condition() to confirm that only the im2col kernel could be causing issues? |
You could also try my pull request #88; I optimized the im2col kernel to make more efficient use of GPU resources. |
ggml_tensor* postive = sd->get_learned_condition(work_ctx, prompt); Here is my check of the output struct ggml_tensor * im2col = ggml_im2col(ctx, a, b, s0, s1, p0, p1, d0, d1, true); |
I tried enabling taesd,got the result Running it on another laptop of mine can generate normal images and the efficiency is significantly improved. |
That seems quite challenging to debug as it is the matrix multiplication kernel, and I can't think of a solution since I wasn't the one who created it. |
This is usually caused by insufficient GPU memory. |
The user has a GTX 1070, it has 8GB VRAM, and I can run w/o issues with a RTX 3050 laptop 4GB VRAM |
Can we get cuda version? |
v11.8 |
It doesn't look wrong. How could this happen? |
I'm confused too, I tried using llama.cpp and it worked fine too. Maybe I should buy a new GPU |
I had same issue has @wailovet. I ran the inferencing on MX150 Nvidia GPU, cuda v11.7 Could there be so compatibility issue with pascal GPU? |
I'm not very experienced in CUDA; in fact, I'm struggling to add some features that could significantly accelerate image generation speed in CUDA. However, I'm facing many issues due to my lack of understanding in GPU engineering, so I can't shed light on the matter. I'm sorry that it's not working for some people. If I had equivalent hardware for testing, perhaps I could be of assistance. |
Just to provide another data point and a potential fix. I have a GTX 1070 and also got images with all green pixels. The CUDA version is 12.1 As @wailovet showed above, the problem seems coming from cuda version of I suspect the culprit is in I have got a fix that works. Here is the patch.
Anyone with old NVIDIA GPUs can give a try. It also fixes two test cases: |
Once the upstream ggml merges your PR, I'll update ggml to the corresponding commit to fix this issue. |
That'll be great! Glad I can finally try SD with the generation old 1070. Still, it is much faster than CPU 😄 |
It has been merged now and fixed whisper.cpp for older GPUs. Time to fix this issue too? |
I've attempted to update this branch #134 to the latest ggml, but encountered some issues when generating images larger than 512x512. I haven't had time to pinpoint the exact cause yet. |
@wailovet @bssrdf @SmallAndSoft I've updated ggml to the latest code. You can try using the latest master branch to see if the issue still persists. |
@leejet That fixed the issue for my GTX 1060. |
Thank you, @leejet, for bringing in this update. For some reason, SD runs much faster on cuda backend with this update, especially the decoding latent step. |
I tried the execution result of cuda and everything is fine |
I encountered a strange problem. After using CUDA, I got a pure green picture when running.But it works fine on another computer.
The text was updated successfully, but these errors were encountered: