-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Having some problem related to dataset loading. Freezing/Process Crash/Memory Problem. #29
Comments
Your dataset is relatively high resolution videos. This is going to use a lot more RAM than images just to load and preprocess raw video frames. In dataset.py, can you try changing NUM_PROC at the top to 1? This causes all the dataset.map() calls to not use parallelism and do everything in-process. It should use less memory because there's no longer 8 processes trying to load videos at the same time. But it will be much slower. I've tried to optimize the memory usage of the video preprocessing, but maybe there's more that can be done. The relevant code is in models/base.py, PreprocessMediaFile class. |
@tdrussell I set the resolutions to train on to 960, does it not resize the videos accordingly as well? I tried NUM_PROC=1, it just hangs after reaching step 4 again. Also yes. much much slower. It uses about 150GB. No OOM/process crash, it just freezes.
|
How many frames is your longest video? The code must load the entire raw video into memory, before extracting one or more video clips (depending on how you configured it) of the correct number of frames for the bucket. The raw video pixels use a lot of memory. E.g. even for the size bucket (1280, 704, 72), at float32 it will use nearly 1GB. If your video files on disk are much longer than this it would explain the high memory use. |
The videos are within 60-120 frames. Around 5100 videos, at 4GB disk storage. Is there not a batch size we can set to how many videos are being loaded? Or is it just 1 by default? Can we somehow have the videos resized, encoded onto latents, save the tensors and then categorize them by buckets? Is that not possible? |
Okay, then something unexpected is happening. With NUM_PROC=1 it will load only a single video at a time. If your input videos are a few seconds long at most, it might use a few GB worst case (since rearranging dimensions makes a copy, etc), but it should not use 100+ GB. Let me do some tests with video on my end. EDIT: one more thing to confirm, are you keeping caching_batch_size=1? If it's >1 it will still load the input videos one at a time, but it would need to keep all the latents in memory for the whole batch. |
Yes @tdrussell , the caching_batch_size is 1. I can share my config here, I don't think I have modified much. EDIT: the json within the dataset.toml stores the captions. I just create a .txt file based on that for the videos if it doesn't exist
|
I can't reproduce this. Using LTX-Video, with resolutions=[960] and frame_buckets=[1, 72], it successfully processes and caches the latents, with about 51GB peak RAM usage. This is even with leaving NUM_PROC=8 which means it's loading 8 videos in parallel. Are you using the latest commit and latest version of the LTX-Video model from Huggingface? And what is your --num_gpus flag for deepspeed? Each process will have to load all the models (transformer, vae, text encoder) so that could be a cause as well. |
@tdrussell This issue is reproducible on systems using TPUs/XLA with a custom DeepSpeed Backend implementation. I guess it's still very experimental and I haven't make a PR to merge and review the changes fully. Maybe that's why it's having problems. But I'm currently testing on another system with NVIDIA GPUs, using DeepSpeed Cuda Backend, and right now from NUM_PROC=1->8, everything works as expected. It uses about 50GB as you stated. But it's still very slow, and when increasing the NUM_PROC to 16, 32, 36. probably anything higher than 8 makes the dataset loading significantly longer. Up to minutes for just the first step, it then afterwards only utilized as much as NUM_PROC=8? Memory Usage is a bit higher at around 100-130GB, but in terms of speed, it's the same and even slower. |
@tdrussell These are the logs after DeepSpeed spawns the processes and its subprocess starts to load the dataset
Might be related to huggingface/datasets#4883 as well. The program then freezes. Also the program uses about 200GB of System Mem. from 30G -> 250GB.
The text was updated successfully, but these errors were encountered: