Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.cuda.OutOfMemoryError #10

Open
liang315 opened this issue May 21, 2024 · 1 comment
Open

torch.cuda.OutOfMemoryError #10

liang315 opened this issue May 21, 2024 · 1 comment

Comments

@liang315
Copy link

File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 496.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 235.06 MiB is free. Process 20398 has 14.52 GiB memory in use. Of the allocated memory 14.06 GiB is allocated by PyTorch, and 330.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@liang315
Copy link
Author

WARNING [05/17 10:24:18 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
head.{bias, weight}
layers.0.blocks.0.attn.relative_position_bias_table
layers.0.blocks.1.attn.relative_position_bias_table
layers.0.blocks.1.attn_mask
layers.1.blocks.0.attn.relative_position_bias_table
layers.1.blocks.1.attn.relative_position_bias_table
layers.1.blocks.1.attn_mask
layers.2.blocks.0.attn.relative_position_bias_table
layers.2.blocks.1.attn.relative_position_bias_table
layers.2.blocks.1.attn_mask
layers.2.blocks.2.attn.relative_position_bias_table
layers.2.blocks.3.attn.relative_position_bias_table
layers.2.blocks.3.attn_mask
layers.2.blocks.4.attn.relative_position_bias_table
layers.2.blocks.5.attn.relative_position_bias_table
layers.2.blocks.5.attn_mask
layers.3.blocks.0.attn.relative_position_bias_table
layers.3.blocks.1.attn.relative_position_bias_table
norm.{bias, weight}
[05/17 10:24:18 d2.engine.train_loop]: Starting training from iteration 0
/opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
ERROR [05/17 10:24:21 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/kaggle/working/GenerateU/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/kaggle/working/GenerateU/detectron2/engine/defaults.py", line 498, in run_step
self._trainer.run_step()
File "/kaggle/working/GenerateU/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/ddetrs_vl_uni.py", line 197, in forward
output, loss_dict = self.detr.forward(images, targets, self.criterion, train=True, clip_object_descriptions_features=clip_object_descriptions_features, dataset_source=dataset_source, ann_type=ann_type)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/segmentation_condInst_new_encodfpn.py", line 143, in forward
self.detr.transformer(srcs, masks, poses, query_embeds, mask_on=True)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 153, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 259, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 229, in forward
src = self.forward_ffn(src)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 217, in forward_ffn
src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 496.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 235.06 MiB is free. Process 20398 has 14.52 GiB memory in use. Of the allocated memory 14.06 GiB is allocated by PyTorch, and 330.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[05/17 10:24:21 d2.engine.hooks]: Total training time: 0:00:02 (0:00:00 on hooks)
[05/17 10:24:21 d2.utils.events]: iter: 0 lr: N/A max_mem: 14396M
Traceback (most recent call last):
File "/kaggle/working/GenerateU/projects/DDETRS/train_net.py", line 249, in
launch(
File "/kaggle/working/GenerateU/detectron2/engine/launch.py", line 82, in launch
main_func(*args)
File "/kaggle/working/GenerateU/projects/DDETRS/train_net.py", line 233, in main
trainer.train()
File "/kaggle/working/GenerateU/detectron2/engine/defaults.py", line 488, in train
super().train(self.start_iter, self.max_iter)
File "/kaggle/working/GenerateU/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/kaggle/working/GenerateU/detectron2/engine/defaults.py", line 498, in run_step
self._trainer.run_step()
File "/kaggle/working/GenerateU/detectron2/engine/train_loop.py", line 273, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/ddetrs_vl_uni.py", line 197, in forward
output, loss_dict = self.detr.forward(images, targets, self.criterion, train=True, clip_object_descriptions_features=clip_object_descriptions_features, dataset_source=dataset_source, ann_type=ann_type)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/segmentation_condInst_new_encodfpn.py", line 143, in forward
self.detr.transformer(srcs, masks, poses, query_embeds, mask_on=True)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 153, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 259, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 229, in forward
src = self.forward_ffn(src)
File "/kaggle/working/GenerateU/projects/DDETRS/ddetrs/models/deformable_detr/deformable_transformer.py", line 217, in forward_ffn
src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 496.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 235.06 MiB is free. Process 20398 has 14.52 GiB memory in use. Of the allocated memory 14.06 GiB is allocated by PyTorch, and 330.42 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant