Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oom #57

Closed
timchenxiaoyu opened this issue Nov 24, 2024 · 13 comments
Closed

oom #57

timchenxiaoyu opened this issue Nov 24, 2024 · 13 comments

Comments

@timchenxiaoyu
Copy link

nvidia-smi

Sun Nov 24 16:54:27 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla T4 On | 00000000:00:07.0 Off | 0 |
| N/A 44C P8 10W / 70W | 3MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 13.06 MiB is free. Including non-PyTorch memory, this process has 14.73 GiB memory in use. Of the allocated memory 14.51 GiB is allocated by PyTorch, and 69.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@zliucz
Copy link
Member

zliucz commented Nov 24, 2024

Hi. Your device has 15GB VRAM, which seems capable of running our system. Would you please restart the machine and rerun our system? When does this oom occur? Thanks.

@timchenxiaoyu
Copy link
Author

already reboot ,but not work

python gradio_run.py

Total VRAM 15102 MB, total RAM 30700 MB
pytorch version: 2.1.2+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 Tesla T4 : native
Using pytorch cross attention
['/root/MagicQuill', '/root/miniconda3/envs/py310/lib/python310.zip', '/root/miniconda3/envs/py310/lib/python3.10', '/root/miniconda3/envs/py310/lib/python3.10/lib-dynload', '/root/.local/lib/python3.10/site-packages', '/root/miniconda3/envs/py310/lib/python3.10/site-packages', 'editable.llava-1.2.2.post1.finder.path_hook', '/root/MagicQuill/MagicQuill', '/root/miniconda3/envs/py310/lib/python3.10/site-packages/setuptools/_vendor']
/root/miniconda3/envs/py310/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.

BrushNet inference: do_classifier_free_guidance is True
BrushNet inference, step = 0: image batch = 1, got 2 latents, starting from 0
BrushNet inference: sample torch.Size([2, 4, 85, 64]) , CL torch.Size([2, 5, 85, 64]) dtype torch.float16
/root/miniconda3/envs/py310/lib/python3.10/site-packages/diffusers/models/resnet.py:323: FutureWarning: scale is deprecated and will be removed in version 1.0.0. The scale argument is deprecated and will be ignored. Please remove it, as passing it will raise an error in the future. scale should directly be passed while calling the underlying pipeline component i.e., via cross_attention_kwargs.
deprecate("scale", "1.0.0", deprecation_message)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:08<00:00, 2.45it/s]
Requested to load AutoencoderKL
Loading 1 new model
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Traceback (most recent call last):
File "/root/MagicQuill/MagicQuill/comfy/sd.py", line 336, in decode
pixel_samples[x:x+batch_number] = self.process_output(self.first_stage_model.decode(samples).to(self.output_device).float())
File "/root/MagicQuill/MagicQuill/comfy/ldm/models/autoencoder.py", line 200, in decode
dec = self.decoder(dec, **decoder_kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/MagicQuill/MagicQuill/comfy/ldm/modules/diffusionmodules/model.py", line 635, in forward
h = self.up[i_level].block[i_block](h, temb, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/MagicQuill/MagicQuill/comfy/ldm/modules/diffusionmodules/model.py", line 142, in forward
h = self.conv1(h)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/MagicQuill/MagicQuill/comfy/ops.py", line 80, in forward
return super().forward(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 170.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 85.06 MiB is free. Including non-PyTorch memory, this process has 14.66 GiB memory in use. Of the allocated memory 14.35 GiB is allocated by PyTorch, and 159.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/MagicQuill/MagicQuill/comfy/ldm/modules/diffusionmodules/model.py", line 60, in forward
x = torch.nn.functional.interpolate(x, scale_factor=2.0, mode="nearest")
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/functional.py", line 3983, in interpolate
return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 85.06 MiB is free. Including non-PyTorch memory, this process has 14.66 GiB memory in use. Of the allocated memory 14.37 GiB is allocated by PyTorch, and 138.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events
response = await route_utils.call_process_api(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
output = await app.get_blocks().process_api(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api
result = await self.call_function(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/anyio/backends/asyncio.py", line 943, in run
result = context.run(func, *args)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper
response = f(*args, **kwargs)
File "/root/MagicQuill/gradio_run.py", line 152, in generate_image_handler
res = generate(
File "/root/MagicQuill/gradio_run.py", line 120, in generate
latent_samples, final_image, lineart_output, color_output = scribbleColorEditModel.process(
File "/root/MagicQuill/MagicQuill/scribble_color_edit.py", line 123, in process
final_image = self.vae_decoder.decode(self.vae, latent_samples)[0]
File "/root/MagicQuill/MagicQuill/comfyui_utils.py", line 158, in decode
return (vae.decode(samples["samples"]), )
File "/root/MagicQuill/MagicQuill/comfy/sd.py", line 342, in decode
pixel_samples = self.decode_tiled
(samples_in)
File "/root/MagicQuill/MagicQuill/comfy/sd.py", line 295, in decode_tiled

(comfy.utils.tiled_scale(samples, decode_fn, tile_x // 2, tile_y * 2, overlap, upscale_amount = self.upscale_ratio, output_device=self.output_device, pbar = pbar) +
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/MagicQuill/MagicQuill/comfy/utils.py", line 440, in tiled_scale
ps = function(s_in).to(output_device)
File "/root/MagicQuill/MagicQuill/comfy/sd.py", line 293, in
decode_fn = lambda a: self.first_stage_model.decode(a.to(self.vae_dtype).to(self.device)).float()
File "/root/MagicQuill/MagicQuill/comfy/ldm/models/autoencoder.py", line 200, in decode
dec = self.decoder(dec, **decoder_kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/MagicQuill/MagicQuill/comfy/ldm/modules/diffusionmodules/model.py", line 639, in forward
h = self.up[i_level].upsample(h)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(args, **kwargs)
File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/root/MagicQuill/MagicQuill/comfy/ldm/modules/diffusionmodules/model.py", line 63, in forward
out = torch.empty((b, c, h
2, w
2), dtype=x.dtype, layout=x.layout, device=x.device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 85.06 MiB is free. Including non-PyTorch memory, this process has 14.66 GiB memory in use. Of the allocated memory 14.37 GiB is allocated by PyTorch, and 138.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@zliucz
Copy link
Member

zliucz commented Nov 24, 2024

I see. You could manually set LOW_VRAM mode by modifying the code at MagicQuill/comfy/model_management.py or try to disable loading the LLaVA module and DrawNGuess. Personally, I suggest setting it to low_VRAM mode. Thanks.

@timchenxiaoyu
Copy link
Author

how to set LOW_VRAM mode in MagicQuill/comfy/model_management.py

@zliucz
Copy link
Member

zliucz commented Nov 24, 2024

Try to change lines 23-24 from

vram_state = VRAMState.NORMAL_VRAM
set_vram_to = VRAMState.NORMAL_VRAM

to

vram_state = VRAMState.LOW_VRAM
set_vram_to = VRAMState.LOW_VRAM

Let me know if it works. Thanks.

@timchenxiaoyu
Copy link
Author

regret, not work still oom

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 78.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 31.06 MiB is free. Including non-PyTorch memory, this process has 14.71 GiB memory in use. Of the allocated memory 14.36 GiB is allocated by PyTorch, and 203.77 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@timchenxiaoyu
Copy link
Author

Total VRAM 15102 MB, total RAM 30700 MB
pytorch version: 2.1.2+cu118
Set vram state to: LOW_VRAM
Device: cuda:0 Tesla T4 : native
Using pytorch cross attention

@fallbernana123456
Copy link

vram_state = VRAMState.LOW_VRAM
set_vram_to = VRAMState.LOW_VRAM

Try to change lines 23-24 from

尝试更改第23-24行

vram_state = VRAMState.NORMAL_VRAM
set_vram_to = VRAMState.NORMAL_VRAM

to

vram_state = VRAMState.LOW_VRAM
set_vram_to = VRAMState.LOW_VRAM

Let me know if it works. Thanks.

如果有效,请告诉我。谢谢。

设置后没有效果。如何disable loading the LLaVA module and DrawNGuess?

@zliucz
Copy link
Member

zliucz commented Nov 25, 2024

I see. @timchenxiaoyu @fallbernana123456. Just change the 22th line in gradio_run.py from

llavaModel = LLaVAModel()

to

llavaModel = None

Then, you can disable the DrawNGuess by clicking the wand icon on above. You can still manually enter the prompt.

截屏2024-11-25 16 57 23

@zliucz
Copy link
Member

zliucz commented Nov 25, 2024

Alternatively, @timchenxiaoyu @fallbernana123456. You may change the 456 line of MagicQuill/comfy/model_management.py (https://github.com/magic-quill/MagicQuill/blob/main/MagicQuill/comfy/model_management.py) to

cur_loaded_model = loaded_model.model_load(64 * 1024 * 1024, force_patch_weights=force_patch_weights)

This shall force the model to be loaded in vram mode, but in much lower inference speed.

@timchenxiaoyu
Copy link
Author

thanks solve problem @zliucz

@fallbernana123456
Copy link

设置 llavaModel = None 可以运行了。

@Natural-selection1
Copy link

I see. @timchenxiaoyu @fallbernana123456. Just change the 22th line in gradio_run.py from I see. . Just change line 22 in gradio_run.py to

llavaModel = LLaVAModel()

to

llavaModel = None

Then, you can disable the DrawNGuess by clicking the wand icon on above. You can still enter prompts manually.

截屏2024-11-25 16 57 23

Perhaps we should modify the readme to prepare laptop users with 8GB VRAM? Because the first boot prompts that memory overflow is really frustrating haha (because we also spent an hour downloading a 26GB larrrrrrrge file)
image

Or pin this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants