You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half?
#1793
Closed
danielaixer opened this issue
Dec 23, 2023
· 1 comment
I'm trying to train a LoRA model using the optimizer AdamW and with CrossAttention set to none. These parameters help me avoid bitandbytes and xFormers errors, but just when it seems it's working and getting to the optimization steps I get this error:
File "/home/username/kohya_ss/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
And at the end of the terminal this: subprocess.CalledProcessError: Command '['/home/username/kohya_ss/kohya_ss/venv/bin/python', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/home/username/kohya_ss/kohya_ss/datasets/Something', '--resolution=512,512', '--output_dir=/home/username/kohya_ss/kohya_ss/models/Lora/Custom', '--network_alpha=48', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=96', '--output_name=Something2', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=20', '--train_batch_size=4', '--max_train_steps=200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--save_every_n_steps=500', '--bucket_no_upscale', '--noise_offset=0.0', '--sample_sampler=euler', '--sample_prompts=/home/username/kohya_ss/kohya_ss/models/Lora/Custom/sample/prompt.txt', '--sample_every_n_steps=25']' returned non-zero exit status 1.
Based on similar errors mentioning 'Half', I'm pretty sure we need que equivalent of using --precision full --no-half when launching AUTOMATIC1111/stable-diffusion-webui.
The method shown here doesn't improve the situation for me: #1484
Including installing PyTorch ROCm5.7: pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7
Edit: When running "accelerate config", choosing "no" for the question "Do you wish to use FP16 or BF16 (mixed precision)?" didn't help.
Edit: Setting "Mixed precision" to "no" seems to be working, I will update one I confirm I can do a complete LoRA training.
The text was updated successfully, but these errors were encountered:
danielaixer
changed the title
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half?
Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half?
Dec 23, 2023
Okay, confirmed, "Mixed precision" set to "no" works. Regarding "accelerate config", I think it doesn't really matter which mixed precision you choose.
Also, do NOT use AdamW8bit as optimizer (bitandbytes issue), use AdamW instead, and set "CrossAttention" to "none" (xFormers issue).
However, I still can't generate sample images nor captions with kohya_ss, but those issues are secondary.
I'm on Ubuntu 22.04, with 7900XTX GPU, ROCm5.6 and Mesa drivers. I can generate images using GPU via stable-diffusion-webui.
I have installed koyha_ss with these commands:
And I start the GUI with:
I'm trying to train a LoRA model using the optimizer AdamW and with CrossAttention set to none. These parameters help me avoid bitandbytes and xFormers errors, but just when it seems it's working and getting to the optimization steps I get this error:
And at the end of the terminal this:
subprocess.CalledProcessError: Command '['/home/username/kohya_ss/kohya_ss/venv/bin/python', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/home/username/kohya_ss/kohya_ss/datasets/Something', '--resolution=512,512', '--output_dir=/home/username/kohya_ss/kohya_ss/models/Lora/Custom', '--network_alpha=48', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=96', '--output_name=Something2', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=20', '--train_batch_size=4', '--max_train_steps=200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--save_every_n_steps=500', '--bucket_no_upscale', '--noise_offset=0.0', '--sample_sampler=euler', '--sample_prompts=/home/username/kohya_ss/kohya_ss/models/Lora/Custom/sample/prompt.txt', '--sample_every_n_steps=25']' returned non-zero exit status 1.
Based on similar errors mentioning 'Half', I'm pretty sure we need que equivalent of using
--precision full --no-half
when launching AUTOMATIC1111/stable-diffusion-webui.The method shown here doesn't improve the situation for me: #1484
Including installing PyTorch ROCm5.7:
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7
Edit: When running "accelerate config", choosing "no" for the question "Do you wish to use FP16 or BF16 (mixed precision)?" didn't help.
Edit: Setting "Mixed precision" to "no" seems to be working, I will update one I confirm I can do a complete LoRA training.
The text was updated successfully, but these errors were encountered: