-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, can you provide some example data of hunyuanvideo lora training data? #6
Comments
@jordoh So the recommendation is: Is that right? |
If you are trying to train a person's likeness (and not a style or camera motion, etc), I have had success with those settings, yes. Training now with 50 720x540 videos, 30-80 frames per video, joycaption alpha 2 captions of first frame (manually adjusted to not use phrases like "a photo of ..."), it died (unclear exception) at step 9 on first attempt, currently at step 13 of second attempt. ~45 GB vram usage, but - to answer one of my observations/questions in previous comment - the default batch size is 4 (36 images * 10 repeats previously / batch size 4 = 90 steps per epoch; now 50 videos / batch size 4 = 10 steps per epoch - not sure how that maths 🤷), in theory setting |
@jordoh Yes, I was preparing to train for the character By the way What scene are you training in below using the video? Because I see you mentioned above that you can already get good results using pictures for training. |
I'm using iPhone live photos, as they generally capture speaking and other natural movement (smiling, etc), and have an aspect ratio that can scale down to what seems to be dimensions hunyuan works well at (720x540). A note on the batch size: it's dictated by gradient accumulation steps - setting that to 1 reduces batch size to 1 as well, but vram usage is still pretty high at ~42GB, so not much savings there. |
Hey, I keep getting this error after very few steps: "[rank0]: RuntimeError: CUDA error: unknown error [2024-12-16 01:02:59,531] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 1042 4090 24GB VRAM, 15 dataset images (non-resized, assuming aspect ratio bucketing will handle), set 2048 as the resolution instead of 512 (since this is the highest res), batch size=2, LORA rank 16. Step time is highly variable (ex. 1,2,4 take a few seconds, but 3 and 5 take 30-45 minutes). Any ideas? |
One more question, how are the trigger words set up here? Is it the same as flux, just include the word in the caption file. |
I doubt this fits in 24GB of VRAM. Your images are 16x larger in area than 512x512, plus you're using batch size 2 instead of 1. Are you using WSL? Doesn't windows have the weird thing where it swaps VRAM to RAM automatically? Does that even get enabled inside WSL? I don't know. That might be why steps suddenly take extremely long to complete. Try training on 512 res with batch size 1 to start with. HunyuanVideo isn't even pretrained with super high resolutions like 2048, so that might not even work right if you could run it. |
I use the following configuration: train 1024x1024 36 images, Each step takes nearly 1 min, using L40 training. Very slow. |
Yep, working now after resizing heights to 1024 max As a side note, have you got TorchCompile running successfully in Hunyuan? |
triton worked, using the latest comfyui version |
I've trained using joycaption generated captions that include a unique trigger word; haven't tried with just the trigger word. For videos, I'm using a joycaption generated caption of the first frame including a unique trigger word.
On an A40, I was seeing 30 minutes per epoch of 360 images (36 images x 10 repeats) with batch size 4 (so reported as 90 steps). Somewhere around 18 seconds per step (each step a 4 image batch). |
@jordoh Thank you very much for the answer. |
@jordoh It looks like I changed the batch_size to 16=4 x 4 and the speed went down, the L40 48GB is supposed to be a lot faster than the A40, but for the same 1024x1024 x 36, it took me 240min, you took almost 300min, and we all ended up with almost 10 epochs. |
Hi, can you provide some example data of hunyuanvideo lora training data?
The text was updated successfully, but these errors were encountered: