Kohya started using more VRAM for SDXL and using more than it should be #1131

FurkanGozukara · 2024-02-21T20:35:38Z

I have a config which was running on Kaggle fine in previous versions

Right now it is failing on 15 GB gpu

This should not happen

Same settings on OneTrainer uses lesser than 13.5 GB VRAM

Here it fails with 15 GB

It wasn't failing before

All images are 1024x1024
All cached

Here the full training used prompt

I did trainings in past in Kaggle and this exact prompt was working i even have a video of it here

https://youtu.be/16-b1AjvyBE

  accelerate launch --num_cpu_threads_per_process=4      
                         "./sdxl_train.py" --max_grad_norm=0.0 --no_half_vae    
                         --train_text_encoder --ddp_timeout=10000000            
                         --ddp_gradient_as_bucket_view --bucket_no_upscale      
                         --bucket_reso_steps=64 --cache_latents                 
                         --cache_latents_to_disk --full_fp16                    
                         --gradient_checkpointing --learning_rate="1e-05"       
                         --learning_rate_te1="3e-06"                            
                         --logging_dir="/kaggle/working/results/log"            
                         --lr_scheduler="constant" --lr_scheduler_num_cycles="1"
                         --max_data_loader_n_workers="0"                        
                         --resolution="1024,1024" --max_train_steps="1500"      
                         --mem_eff_attn --mixed_precision="fp16"                
                         --optimizer_args scale_parameter=False                 
                         relative_step=False warmup_init=False weight_decay=0.01
                         --optimizer_type="Adafactor"                           
                         --output_dir="/kaggle/working/results/model"           
                         --output_name="2024_02_21_kaggle"                      
                         --pretrained_model_name_or_path="stabilityai/stable-dif
                         fusion-xl-base-1.0"                                    
                         --reg_data_dir="/kaggle/working/results/reg"           
                         --save_every_n_epochs="1" --save_model_as=safetensors  
                         --save_precision="fp16" --train_batch_size="1"         
                         --train_data_dir="/kaggle/working/results/img"         
                         --vae="stabilityai/sdxl-vae" --xformers

Traceback (most recent call last):
  File "/kaggle/working/kohya_ss/./sdxl_train.py", line 779, in <module>
    train(args)
  File "/kaggle/working/kohya_ss/./sdxl_train.py", line 594, in train
    optimizer.step()
  File "/opt/conda/lib/python3.10/site-packages/accelerate/optimizer.py", line 132, in step
    self.scaler.step(self.optimizer, closure)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 374, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 290, in _maybe_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/optimizer.py", line 185, in patched_step
    return method(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/optimization.py", line 715, in step
    update = (grad**2) + group["eps"][0]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 242.00 MiB (GPU 1; 14.75 GiB total capacity; 14.34 GiB already allocated; 53.06 MiB free; 14.47 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

FurkanGozukara · 2024-02-21T20:36:39Z

@kohya-ss

FurkanGozukara · 2024-02-21T21:42:58Z

currently it uses 15.7 GB minimum on Kaggle

So with P100 gpu it works but that means people can't use much faster T4 and Kaggle gives dual T4

Also who has 16 GB GPUs can't use properly either

kohya-ss · 2024-02-22T03:28:55Z

With these options, Text Encoder 2 is trained with the learning rate=1e-5, because --train_text_encoder is specified. I think OneTrainer may train Text Encoder 1 only. If you want to stop Text Encoder 2 training, please specify --learning_rate_te2=0.

FurkanGozukara · 2024-02-22T08:48:43Z

With these options, Text Encoder 2 is trained with the learning rate=1e-5, because --train_text_encoder is specified. I think OneTrainer may train Text Encoder 1 only. If you want to stop Text Encoder 2 training, please specify --learning_rate_te2=0.

wow this is a bug in that case because this is what bmaltais gui generates - i will report him
will test thank you and reply back here

so when we don't provide TE2 what does trainer uses? because this is a big problem for me

FurkanGozukara · 2024-02-22T09:59:21Z

yep i verified this bug exists and breaks my config :/

thank you so much Kohya

Iipython · 2024-02-28T19:45:21Z

Hey, I am encountering the same problem today!!
I have two cloned codes of sd-scripts. One was cloned in 12,2023, and the other was downloaded today.
But I found the new code always reported"out of memory" by using the same configuration as follows:
--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \ --vae=madebyollin/sdxl-vae-fp16-fix \ --dataset_config=/home/lyh/sdvs/sd-scripts/config/finetune.toml \ --output_dir=/home/lyh/sd-scripts/output/finetune_15W \ --output_name=finetune_15W \ --save_model_as=safetensors \ --save_every_n_epochs=1 \ --save_precision="fp16" \ --max_token_length=225 \ --min_timestep=0 \ --max_timestep=1000 \ --max_train_epochs=2000 \ --learning_rate=4e-6 \ --lr_scheduler="constant" \ --optimizer_type="AdamW8bit" \ --xformers \ --gradient_checkpointing \ --gradient_accumulation_steps=128 \ --mem_eff_attn \ --mixed_precision="fp16" \ --logging_dir=logs \

The wired thing comes:
the VRAM occupation with new code:

the VRAM occupation with old code:

why? where is different?

kohya-ss · 2024-02-29T23:22:06Z

As I mentioned in #1141, multiple GPU issue seems to have another reason.

hufenghufeng · 2024-05-27T08:16:53Z

Hey, I am encountering the same problem today!! I have two cloned codes of sd-scripts. One was cloned in 12,2023, and the other was downloaded today. But I found the new code always reported"out of memory" by using the same configuration as follows: --pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0 \ --vae=madebyollin/sdxl-vae-fp16-fix \ --dataset_config=/home/lyh/sdvs/sd-scripts/config/finetune.toml \ --output_dir=/home/lyh/sd-scripts/output/finetune_15W \ --output_name=finetune_15W \ --save_model_as=safetensors \ --save_every_n_epochs=1 \ --save_precision="fp16" \ --max_token_length=225 \ --min_timestep=0 \ --max_timestep=1000 \ --max_train_epochs=2000 \ --learning_rate=4e-6 \ --lr_scheduler="constant" \ --optimizer_type="AdamW8bit" \ --xformers \ --gradient_checkpointing \ --gradient_accumulation_steps=128 \ --mem_eff_attn \ --mixed_precision="fp16" \ --logging_dir=logs \

The wired thing comes: the VRAM occupation with new code:

the VRAM occupation with old code:

why? where is different?

same problem

FurkanGozukara mentioned this issue Feb 22, 2024

learning_rate_te2 must be given as 0 bmaltais/kohya_ss#1992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kohya started using more VRAM for SDXL and using more than it should be #1131

Kohya started using more VRAM for SDXL and using more than it should be #1131

FurkanGozukara commented Feb 21, 2024 •

edited

Loading

FurkanGozukara commented Feb 21, 2024

FurkanGozukara commented Feb 21, 2024

kohya-ss commented Feb 22, 2024

FurkanGozukara commented Feb 22, 2024 •

edited

Loading

FurkanGozukara commented Feb 22, 2024

Iipython commented Feb 28, 2024 •

edited

Loading

kohya-ss commented Feb 29, 2024 •

edited

Loading

hufenghufeng commented May 27, 2024

Kohya started using more VRAM for SDXL and using more than it should be #1131

Kohya started using more VRAM for SDXL and using more than it should be #1131

Comments

FurkanGozukara commented Feb 21, 2024 • edited Loading

FurkanGozukara commented Feb 21, 2024

FurkanGozukara commented Feb 21, 2024

kohya-ss commented Feb 22, 2024

FurkanGozukara commented Feb 22, 2024 • edited Loading

FurkanGozukara commented Feb 22, 2024

Iipython commented Feb 28, 2024 • edited Loading

kohya-ss commented Feb 29, 2024 • edited Loading

hufenghufeng commented May 27, 2024

FurkanGozukara commented Feb 21, 2024 •

edited

Loading

FurkanGozukara commented Feb 22, 2024 •

edited

Loading

Iipython commented Feb 28, 2024 •

edited

Loading

kohya-ss commented Feb 29, 2024 •

edited

Loading