-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's working. Training LORA of the latest version of kohya_ss on AMD GPU,Ubuntu 22.04.2 LTS ,test on RX6800 ,sd1.5&sdxl #1484
Comments
Thanks, this has been driving me nuts however as I get a segfault when trying to run Lora training. I'm on a 7900 XTX, I went through all setup as you and many others have shown, yet getting this segfault: EDIT: I got past the original segfault by changing to
|
I attached the contents of the fil requirements.txt Models, unfortunately, curves are obtained -((( Unfortunately, the latest version that trains normally, in our case for 1.5 models, is 21.7.10 |
Thanks, I went back and re-did the full process and it is working now, only having to change |
strange, but I had 16 GB to work without crashes |
Hi, |
I think it installs torch cpu version, installed needs to be checked very carefuly by the dev of this repo, he has all this installed since the start so he doesnt test if its working when installing from 0 |
Linux only? |
https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html unfortunately not supported-( |
6700xt,Ubuntu,ROCm5.3 successfully train a sdxl lora too. |
has anyone tested rocm 5.7? |
Yes it works |
When I try to train a Dreambooth LoRA I get I'm on Ubuntu 22.04, 7900XTX and I can generate images on Stable Diffusion using the GPU. Install commands:
requirements.txt contains exactly what was shown here: #1484 (comment) Run commands:
I'm pretty sure xFormers for kohya_ss is mandatory, but xFormers doesn't support AMD GPUs... facebookresearch/xformers#807 (comment) Edit: I got rid of the xFormers error by setting "Parameters > Basic > CrossAttention" to "none", but it still doesn't work and it looks like it's not detecting or using the GPU, as I get messages like:
And also: Which I fixed with:
And now I get But even if I fixed that, it's still not using the GPU Edit 2: Using the optimizer AdamW (which doesn't require bitsandbytes) instead of AdamW8bit gives this other error: Edit 3: the above issue is fixed with "Mixed precision" set to "no" on the UI parameters. I can train LoRAs now. |
Is this still working? I tried it but I never do this type of stuff, it took sooo much effort with chatGPT and stuff but didn't get it to work. If someone can verify that it does in fact still work, I will give me and my brother chatGPT another chance. (6800 xt) |
This should still be valid, but apparently it only uses the CPU, not the GPU. |
I followed tornado73's instructions and when I start to train a model, I get these error messages.
So I updated transformers and diffusers and now these error messages are gone. |
I have attempted installing and reinstalling multiple times using the explicit instructions provided it refuses to recognize my AMD GPU. I still get the " tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used." error. I have even copied the steps explicitly using the version of requriements.txt that correlates to the version of Kohya that was available at the time these instructions were written - it makes no difference. Regardless of what I do it simply will not see my GPU. |
I don’t thing kohya sd-scripts support AMD cards…
…On Fri, Feb 16, 2024 at 14:11 fallfro ***@***.***> wrote:
I have attempted installing and reinstalling multiple times using the
explicit instructions provided it refuses to recognize my AMD GPU.
I still get the " tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find
cuda drivers on your machine, GPU will not be used." error.
I have even copied the steps explicitly using the versions of Kohya that
were available at the time these instructions were written - it makes no
difference. Regardless of what I do it simply will not see my GPU.
—
Reply to this email directly, view it on GitHub
<#1484 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZA34XUEEFQECOGAUL7BXTYT6VOVAVCNFSM6AAAAAA4K3RF2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBZGE2TCMBTHA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
This does work as of the latest version downloaded march 2nd: git clone https://github.com/bmaltais/kohya_ss.git this also works with the nightly preview rocm6.0 pip install --use-pep517 --upgrade -r requirements.txt (updated to reflect the list at the top of the post, can disregard darwin versions for mac (on PC anyway), and remove tensorflow and leave tensorflow-rocm, this will mean less steps) (venv) (base) : accelerate config
sudo apt install python3-tk BEFORE RUNNING: #!/usr/bin/env bash adjust the kohya_ss directory as necessary now here is the bit that needs a little tweaking and i am still trying to balance the packages just right. start the gui AFTER editing the script: ignore the gradio warnings, then do your thing in the gui but before training, you need to open another terminal and upgrade gradio within the virtual enviroment, there is probs a way to automate this but i haven't done it yet, anyways i just use: then start your training: The above speeds are on a 6900xt SDXL training at res 1024,1024 I know this has been closed but i hope this helps some people edit: you will get api warnings when you first launch the gui, but you can just leave gradio 4.0.0 installed and it works fine |
|
I think that's... expected with AMD GPU's Yeah, even if the issue title says "It's working", that doesn't seem to include the GPU... |
No, I only get the api error messages for gradio, but they seem to have no impact so far and chatgpt seems to think they can be overlooked, it is 100% using GPU, I used to get nowhere near those speeds now I can train an sdxl lora in around an hour and half, on CPU which i have used, i would literally have to leave it on overnight. I am training on prodigy, using extra optimizer settings decouple=True weight_decay=0.01 betas=[0.9,0.999] d_coef=0.8 use_bias_correction=True safeguard_warmup=True, i am sstill tinkering but the training is defo working, concepts have no issue but am getting some artifacts from overtraining in character loras, which i need to muck around with... there are a few different steps i took from the original post to get it working, and i kooked it a few times, but basically there are a couple of key points...i don't know if you overlooked or not but where i went wrong....
then start the gui from terminal ./gui.sh best of luck, hopefully something there helped. |
Whats the minimum required memory for training SDXL? I have a RX6750 12GB, but keep getting OOM, even with AdamW8Bit, gradient checkpointing and mem attn. |
I can barelly train on my 3090 with 24GB of VRAM. For LoRA probably 16GB might work but might be tough. |
you should be able to get away with it on 12gb. I would try manually resizing my training data down to 1024 before doing it and disable buckets. Cache latents to disk, make sure you have to disk checked, NOT just cache latents (means you can't use any data augmentation tho' like random crop, flip, etc). Don't try to have a batch size over 1. In my experience adam 8-bit won't work on an AMD card, try prodigy, it has an adaptive learning rate, I use these settings for it: decouple=True weight_decay=0.01 betas=[0.9,0.999] d_coef=0.8 use_bias_correction=True safeguard_warmup=True then i use cosine, set all learning rates to 1. you could lower the resolution further if need be, but it might impact the quality of the Lora obviously, especially if going for a photo style, but concepts may turn out just as good, i don't know. There is also a way you can train on a diffusers model, I'm not sure if it is as simple as downloading it from hugging face and selecting it from the script as a custom model, but apparently it uses significantly less resources. |
@zacvine Thanks for the tips, but I still get OOM even with buckets disabled, and caching to disk. I've tried ROCM 5.7 and the nightly 6.0: no difference there. Any other memory tips?
Might look into that later, if this is a lost cause. 😄 My current settings
Crashes with: |
@zacvine Thank You! |
Sorry for the late response. Also, try swapping prodigy to adamw, not adam8bit as this wont work, but adamw should use less memory than prodigy and you are only off by 20mb so this might be enough to push it over without lowering the res. Change the arguments for optimizer to weight_decay=0.01 betas=[0.9,0.999] (the others won't be needed, unless you are using a warmup phase in which case leave in safeguard_warmup=True, but it doesn't look like you are). If this isn't enough, think for the sake of 20mb you could try downing the resolution a tiny bit, if it's a concept, style, or anime character, this might not be noticeable, could be concerning if it is a realistic character or something that requires a lot of detail. I know some people who train on sdxl at the sd2.0 size which, off the top of my head is like 768x768. You may not even need to go that low. Aside from that I could only suggest using a third party program prior to training to ensure the GPU memory is completely empty. Like, if you have a model loaded from stable diffusion or something at the same time. best of luck. |
Regarding the Do not be concerned because of the version number, this works fine and was actually tested by the author himself for versions >5.6, cf. bitsandbytes-foundation/bitsandbytes#756. Personally, I can confirm that it works fine on Arch linux with rocm 6.0 installed from the repos. Compilation and installation are straightforward:
With the rocm-enabled |
@JohnDoe02 your Bitsandbytes works here! Thank you! |
@JohnDoe02 Yeah this is perfect, trained a lora with Adam8bit using this and it actually worked instead of giving me empty black boxes. I would ask though, when using bf16 it seems to increase the training time by roughly 400% (about 8.7 it/s) when I use fp16 with the 8bit optimiser it works well so I don't think it's specific to the bitsandbytes setup. Is this a regular thing in your experience? I am using an older 6000 series card (6900XT) if you hink this could be the issue. @larssn you should for sure try installing this if you haven't because the Adam8bit uses significantly less GPU memory. |
Nice, thanks for the heads up. I definitely will. Did you guys end up downgrading to ROCM 5.6 to be compatible with bitsandbytes, or isn't that necessary? |
@larssn No I still run the nightly with no problems, just did as per the post. I'm trying the same/similar process now to see if it works with everydream because it has a an option to find the best learning rate by training with a validation option enabled for a small number of steps on your dataset, using a learning rate that increases over time. This produces a distinctive shaped curve in a validation graph to illustrate the steps where the model was training best. |
I think it would be more useful if this whole setup was just a fork to quick install |
Theirs a new error with kohya_ss/kohya_gui/class_gui_config.py", line 1, in using this dose make the program and system open... but any training fails with a can't open file sdxl_train_network.py [Errno 2] No such file or directory yes the sd-script folder where it should be is completely empty. I'm no coder so i have no idea what going on, I'm just following instructions. edit 9-24-24 also apparently all python commands need to be replaced with python3 for some reason. |
ArchLinux + AMD RX6800XT confirmed to work with this setup. Life savior man, I've been wandering for about 4-5 hours on how to do it. |
ROCm 5.6.0 и 5.7.1 ,
Dependencies fixed
Change on
requirements.txt
Details
accelerate==0.23.0
albumentations==1.3.0
aiofiles==23.2.1
altair==4.2.2
dadaptation==3.1
diffusers[torch]==0.18.2
easygui==0.98.3
einops==0.6.0
fairscale==0.4.13
ftfy==6.1.1
gradio==3.36.1
huggingface-hub==0.15.1
keras==2.12.0
invisible-watermark==0.2.0
lion-pytorch==0.0.6
lycoris_lora==1.8.3
gradio==3.36.1; sys_platform == 'darwin'
gradio==3.36.1; sys_platform != 'darwin'
huggingface-hub==0.15.1; sys_platform == 'darwin'
huggingface-hub==0.15.1; sys_platform != 'darwin'
open-clip-torch==2.20.0
opencv-python==4.7.0.68
prodigyopt==1.0
pytorch-lightning==1.9.0
tensorflow-rocm==2.12.0.560
tensorboard==2.12.0 ; sys_platform != 'darwin'
tensorboard==2.12.0 ; sys_platform == 'darwin'
tensorflow==2.12.0; sys_platform != 'darwin'
rich==13.4.1
safetensors==0.3.1
timm==0.6.12
tk==0.1.0
toml==0.10.2
transformers==4.30.2
voluptuous==0.13.1
wandb==0.15.0
-e . # no_verify leave this to specify not checking this a verification stage
install `
tested on ubuntu Ubuntu 22.04.2 LTS + rx6800
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
python -m venv venv
source venv/bin/activate
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6
pip install --use-pep517 --upgrade -r requirements.txt
accelerate config
sudo apt install python3-tk
export HSA_OVERRIDE_GFX_VERSION=10.3.0
source venv/bin/activate
python kohya_gui.py "$@"
Change on gui.sh
#!/usr/bin/env bash
export HSA_OVERRIDE_GFX_VERSION=10.3.0
source venv/bin/activate
python kohya_gui.py "$@"
Customization files from previous versions are not suitable ,
this is my working sample, change it for yourself
LoRa training on SD1.5
.json
Details
"LoRA_type": "Standard",
"adaptive_noise_scale": 0,
"additional_parameters": "",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 64,
"cache_latents": true,
"cache_latents_to_disk": false,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": ".txt",
"clip_skip": 2,
"color_aug": false,
"conv_alpha": 1,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 1,
"decompose_both": false,
"dim_from_weights": false,
"down_lr_weight": "",
"enable_bucket": false,
"epoch": 1,
"factor": -1,
"flip_aug": false,
"full_bf16": false,
"full_fp16": false,
"gradient_accumulation_steps": "1",
"gradient_checkpointing": false,
"keep_tokens": "0",
"learning_rate": 0.0001,
"logging_dir": "/home/tor/kohya_ss/LORA/log",
"lora_network_weights": "",
"lr_scheduler": "constant",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "1",
"max_resolution": "768,768",
"max_timestep": 1000,
"max_token_length": "75",
"max_train_epochs": "",
"max_train_steps": "",
"mem_eff_attn": true,
"mid_lr_weight": "",
"min_bucket_reso": 256,
"min_snr_gamma": 0,
"min_timestep": 0,
"mixed_precision": "fp16",
"model_list": "custom",
"module_dropout": 0,
"multires_noise_discount": 0,
"multires_noise_iterations": 0,
"network_alpha": 128,
"network_dim": 256,
"network_dropout": 0,
"no_token_padding": false,
"noise_offset": 0,
"noise_offset_type": "Original",
"num_cpu_threads_per_process": 2,
"optimizer": "AdamW",
"optimizer_args": "",
"output_dir": "/home/tor/kohya_ss/LORA/model",
"output_name": "dzetaA4_80_",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "/home/tor/kohya_ss/model/absolutereality_v181.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"reg_data_dir": "",
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 0,
"sample_prompts": "",
"sample_sampler": "euler_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "fp16",
"save_state": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 0,
"sdxl": false,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "",
"shuffle_caption": false,
"stop_text_encoder_training": 0,
"text_encoder_lr": 0.0004,
"train_batch_size": 1,
"train_data_dir": "/home/tor/kohya_ss/LORA/img",
"train_on_input": true,
"training_comment": "",
"unet_lr": 0.0001,
"unit": 1,
"up_lr_weight": "",
"use_cp": false,
"use_wandb": false,
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae_batch_size": 0,
"wandb_api_key": "",
"weighted_captions": false,
"xformers": "none"
LoRa training on sdxl
.json
Details
"LoRA_type": "Standard",
"adaptive_noise_scale": 0,
"additional_parameters": "",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": false,
"bucket_reso_steps": 64,
"cache_latents": true,
"cache_latents_to_disk": true,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0.05,
"caption_extension": ".txt",
"clip_skip": "1",
"color_aug": false,
"conv_alpha": 1,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 1,
"decompose_both": false,
"dim_from_weights": false,
"down_lr_weight": "",
"enable_bucket": false,
"epoch": 50,
"factor": -1,
"flip_aug": false,
"full_bf16": false,
"full_fp16": false,
"gradient_accumulation_steps": "1",
"gradient_checkpointing": true,
"keep_tokens": "0",
"learning_rate": 3e-05,
"logging_dir": "/home/tor/kohya_ss/LORA/log",
"lora_network_weights": "",
"lr_scheduler": "constant",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "0",
"max_resolution": "1024,1024",
"max_timestep": 1000,
"max_token_length": "75",
"max_train_epochs": "50",
"max_train_steps": "",
"mem_eff_attn": true,
"mid_lr_weight": "",
"min_bucket_reso": 256,
"min_snr_gamma": 5,
"min_timestep": 0,
"mixed_precision": "fp16",
"model_list": "custom",
"module_dropout": 0,
"multires_noise_discount": 0,
"multires_noise_iterations": 0,
"network_alpha": 32,
"network_dim": 32,
"network_dropout": 0,
"no_token_padding": false,
"noise_offset": 0,
"noise_offset_type": "Original",
"num_cpu_threads_per_process": 2,
"optimizer": "AdamW",
"optimizer_args": "",
"output_dir": "/home/tor/kohya_ss/LORA/model",
"output_name": "dzetaA4xl",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "/home/tor/kohya_ss/model/sdXL_v10VAEFix.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"reg_data_dir": "",
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 0,
"sample_prompts": "",
"sample_sampler": "euler_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "fp16",
"save_state": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 0,
"sdxl": true,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "",
"shuffle_caption": false,
"stop_text_encoder_training": 0,
"text_encoder_lr": 3e-05,
"train_batch_size": 3,
"train_data_dir": "/home/tor/kohya_ss/LORA/img",
"train_on_input": true,
"training_comment": "3 repeats. More info: https://civitai.com/articles/1771",
"unet_lr": 3e-05,
"unit": 1,
"up_lr_weight": "",
"use_cp": false,
"use_wandb": false,
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae_batch_size": 0,
"wandb_api_key": "",
"weighted_captions": false,
"xformers": "none"
Good luck
The text was updated successfully, but these errors were encountered: