Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add GPU number and lazy load img to GPU #437

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sword4869
Copy link

hello,

  1. I add a gpu number argument in ModelParams class.
  2. In order to save GPU memory, I cancelled the loading of img to GPU when creating the camera, and delayed loading img to GPU during training. This greatly saves GPU memory. I tested it in RTX2080ti and less than 2GB during initial training, even if trained to 30k, less than 5GB.

@pablospe
Copy link

pablospe commented Nov 23, 2023

This is a great improvement in memory usage, this should be merged into main branch. @Snosixtyboo

scene/cameras.py Outdated
Comment on lines 39 to 40
if lazy_load:
self.data_device = torch.device("cpu")
Copy link

@pablospe pablospe Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe equivalent?

if lazy_load:
    data_device = "cpu"

try:
    self.data_device = torch.device(data_device)
....

If this is correct, perhaps lazy_load can be replaced directly with data_device (and it needs to be exposed in train.py). Perhaps adding a comment in the README, there is a question about the memory and the different approaches when there is no 24GB.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your reminder, I realized that other tensors were directly migrated through . cuda, and data_device is only responsible for images, which can indeed function as a lazy load.

There is no need to expose data_device in train.py as it is already specified in the arguments/__init__.py.

All in all, nothing needs to be modified. If possible, I suggest modifying the "24 GB VRAM" in README, as it can easily make people mistakenly think that this is the minimum configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to understand your reply. What do you mean by . cuda? I think the lazy_load was a good option to add, so you don't need to modify every time the __init__.py. Perhaps, could you expand how you would modify the README to clarify that data_device can be used for lazy loading?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"other tensors were directly migrated through . cuda" precisely means that they are fixed to the cuda device, e.g. create gaussian.

There is no need to need to modify data_device every time the __init__.py. The train.py uses argparse.ArgumentParser to analyse the command arguments. And data_device is can be called by python train.py --data_device cpu -s <path to COLMAP or NeRF Synthetic dataset>.

# scene/cameras.py
# if here data_device is specified as `cpu`, self.original_image is on cpu
try:
    self.data_device = torch.device(data_device)
except Exception as e:
    print(e)
    print(f"[Warning] Custom device {data_device} failed, fallback to default cuda device" )
    self.data_device = torch.device("cuda")

self.original_image = image.clamp(0.0, 1.0).to(self.data_device)
self.image_width = self.original_image.shape[2]
self.image_height = self.original_image.shape[1]

if gt_alpha_mask is not None:
    self.original_image *= gt_alpha_mask.to(self.data_device)
else:
    self.original_image *= torch.ones((1, self.image_height, self.image_width), device=self.data_device)

# train.py
# original_image is sent to gpu while iterating
gt_image = viewpoint_cam.original_image.cuda()

image

data_device in README

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modify REAMDE to clarify lazy load and recommit it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarifications!

@sword4869 sword4869 closed this Nov 24, 2023
@sword4869 sword4869 reopened this Nov 25, 2023
@NiklasVoigt
Copy link

typo, should be device with a c --data_device cpu

@sword4869
Copy link
Author

typo, should be device with a c --data_device cpu

tks for ur careful check

@Ky1eYang
Copy link

maybe we can reduce more usage of memory by lazy call function loadCam.

def lazy_call(f, *args, **kwargs):
    return lambda: f(*args, **kwargs)

def cameraList_from_camInfos(cam_infos, resolution_scale, args, lazy_load):
    camera_list = []
    # pdb.set_trace()
    for id, c in enumerate(cam_infos):
        if lazy_load:
            camera_list.append(lazy_call(loadCam, args, id, c, resolution_scale))
        else:
            camera_list.append(loadCam(args, id, c, resolution_scale))
    return camera_list`

In function 'cameraList_from_camInfos', we can delay executing loadCam and execute it in 'training'.

# Pick a random Camera
if not viewpoint_stack:
    viewpoint_stack = scene.getTrainCameras().copy()
viewpoint_cam = viewpoint_stack.pop(randint(0, len(viewpoint_stack)-1))
if scene.lazy_load:
    viewpoint_cam = viewpoint_cam()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants