Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? #1793

danielaixer · 2023-12-23T02:42:24Z

I'm on Ubuntu 22.04, with 7900XTX GPU, ROCm5.6 and Mesa drivers. I can generate images using GPU via stable-diffusion-webui.

I have installed koyha_ss with these commands:

git clone https://github.com/bmaltais/kohya_ss.git 
cd kohya_ss
python -m venv venv
source venv/bin/activate
pip install --use-pep517 --upgrade -r requirements.txt
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm5.6
accelerate config

And I start the GUI with:

export HSA_OVERRIDE_GFX_VERSION=11.0.0
source venv/bin/activate
python kohya_gui.py --server_port 7863 --listen 0.0.0.0

I'm trying to train a LoRA model using the optimizer AdamW and with CrossAttention set to none. These parameters help me avoid bitandbytes and xFormers errors, but just when it seems it's working and getting to the optimization steps I get this error:

  File "/home/username/kohya_ss/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

And at the end of the terminal this:
subprocess.CalledProcessError: Command '['/home/username/kohya_ss/kohya_ss/venv/bin/python', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/home/username/kohya_ss/kohya_ss/datasets/Something', '--resolution=512,512', '--output_dir=/home/username/kohya_ss/kohya_ss/models/Lora/Custom', '--network_alpha=48', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=96', '--output_name=Something2', '--lr_scheduler_num_cycles=1', '--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=20', '--train_batch_size=4', '--max_train_steps=200', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--save_every_n_steps=500', '--bucket_no_upscale', '--noise_offset=0.0', '--sample_sampler=euler', '--sample_prompts=/home/username/kohya_ss/kohya_ss/models/Lora/Custom/sample/prompt.txt', '--sample_every_n_steps=25']' returned non-zero exit status 1.

Based on similar errors mentioning 'Half', I'm pretty sure we need que equivalent of using --precision full --no-half when launching AUTOMATIC1111/stable-diffusion-webui.

The method shown here doesn't improve the situation for me: #1484
Including installing PyTorch ROCm5.7: pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7

Edit: When running "accelerate config", choosing "no" for the question "Do you wish to use FP16 or BF16 (mixed precision)?" didn't help.

Edit: Setting "Mixed precision" to "no" seems to be working, I will update one I confirm I can do a complete LoRA training.

The text was updated successfully, but these errors were encountered:

danielaixer · 2023-12-26T00:19:26Z

Okay, confirmed, "Mixed precision" set to "no" works. Regarding "accelerate config", I think it doesn't really matter which mixed precision you choose.

Also, do NOT use AdamW8bit as optimizer (bitandbytes issue), use AdamW instead, and set "CrossAttention" to "none" (xFormers issue).

However, I still can't generate sample images nor captions with kohya_ss, but those issues are secondary.

danielaixer closed this as completed Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? #1793

Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? #1793

danielaixer commented Dec 23, 2023 •

edited

Loading

danielaixer commented Dec 26, 2023 •

edited

Loading

Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? #1793

Linux and AMD Radeon RX 7900XTX: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' | Is it possible to enable the stable-diffusion-webui equivalent to --precision full --no-half? #1793

Comments

danielaixer commented Dec 23, 2023 • edited Loading

danielaixer commented Dec 26, 2023 • edited Loading

danielaixer commented Dec 23, 2023 •

edited

Loading

danielaixer commented Dec 26, 2023 •

edited

Loading