-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the code of new version is using more VRAM #1141
Comments
The PR #989 was merged into main branch 21, Dec 2023. I think it may cause this issue. Please add |
@kohya-ss these flags didn't help me with multi-gpu training... I have 3x4090 onboard. accelerate launch --num_cpu_threads_per_process=2 "/home/storuky/ml/train/kohya/sd-scripts/sdxl_train.py" --cache_text_encoder_outputs --bucket_no_upscale --bucket_reso_steps=64 --cache_latents --cache_latents_to_disk --caption_extension=".txt" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=1024 --gradient_checkpointing --learning_rate="2e-06" --logging_dir="/home/storuky/ml/train/data_dir/log" --lr_scheduler="constant" --max_data_loader_n_workers="0" --resolution="1024,1024" --max_train_steps="3200" --mixed_precision="bf16" --optimizer_args scale_parameter=False relative_step=False warmup_init=False --optimizer_type="Adafactor" --output_dir="/home/storuky/ml/train/data_dir/model" --output_name="OutModel" --pretrained_model_name_or_path="/home/storuky/ml/sd/stable-diffusion-webui-forge/models/Stable-diffusion/Training2-000005.safetensors" --reg_data_dir="/home/storuky/ml/train/data_dir/reg" --save_every_n_epochs="1" --save_model_as=safetensors --save_precision="bf16" --train_batch_size="1" --train_data_dir="/home/storuky/ml/train/data_dir/img" --xformers It's going well.
I'm getting OOM error. I tried to add |
I reverted this PR locally and now it uses less VRAM. |
PR #989 fixes gradients synchronization. If #989 is reverted, the gradient is not synchronized, so it is similar to single GPU training in my understanding. I'm not familiar with multiple GPU training, but could you try the training with |
@kohya-ss yes, full_bf16 works well (if we talk about VRAM usage) but it has much worse results in terms of accuracy 🤷♂️ For example, hair sticks together as dirty. Small detailed objects turn into blots... etc... |
Hey, I am encountering the same problem today!!
I have two cloned codes of sd-scripts. One was cloned in 12,2023, and the other was downloaded in 2024, Feb 28.
But I found the new code always reported"out of memory" by using the same configuration as follows:
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \ --vae=madebyollin/sdxl-vae-fp16-fix \ --dataset_config=/home/lyh/sdvs/sd-scripts/config/finetune.toml \ --output_dir=/home/lyh/sd-scripts/output/finetune_15W \ --output_name=finetune_15W \ --save_model_as=safetensors \ --save_every_n_epochs=1 \ --save_precision="fp16" \ --max_token_length=225 \ --min_timestep=0 \ --max_timestep=1000 \ --max_train_epochs=2000 \ --learning_rate=4e-6 \ --lr_scheduler="constant" \ --optimizer_type="AdamW8bit" \ --xformers \ --gradient_checkpointing \ --gradient_accumulation_steps=128 \ --mem_eff_attn \ --mixed_precision="fp16" \ --logging_dir=logs \
The wired thing comes:
the VRAM occupation with new code:
the VRAM occupation with old code:
why? what is different?
The text was updated successfully, but these errors were encountered: