-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how is the progress? #20
Comments
Hi @fenghe12 im very active to build out IMF neural video codec - it is another microsoft paper - (not stable diffusion) here's the training for IMF the reconstructed image is 32 floats with some stylegan modulation. its very light weight. hopefully google can fix this - https://github.com/AlexanderLutsenko/nobuco woking this paper - the performance of VASA is kinda unique. UPDATE i dump a bunch of fresh code from claude - theres some flux here in code models UPDATE - UPDATE i cherry pick the models from megaportraits to do the encoding with stage 1 python train_stage_1.py but hitting OOM - dont remember this being broken - have to debug https://github.com/johndpope/MegaPortrait-hack torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 23.59 GiB of which 173.75 MiB is free. Process 4093651 has 262.20 MiB memory in use. Including non-PyTorch memory, this process has 20.43 GiB memory in use. Of the allocated memory 20.02 GiB is allocated by PyTorch, and 38.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Update - Monday 5th November Its training stage 1 - here - overfitting example Next step - attempt to import this into stage 2. but I’m more interested to test out my latest vasa motion generator code. Update Nov 19th So I abandon the megaportraits code / logic and cherry pick the emo portraits volumetric avatar
Canonical Volume: [B, T, C, D, H, W] = [1, 50, 96, 16, 64, 64] ID Embed: We only save one per video, so this is negligible For a 5 second video at 30fps = 150 frames: Number of windows = (150 - 50) / 25 + 1 = 5 windows (due to 50% overlap) to get this into the diffusion transformer - i hit OOM errors - and basically hit a wall with 3090 gpu. |
we collected some face videos data and constructed a new dataset(named FaceVid-1K),maybe available in next month.
The text was updated successfully, but these errors were encountered: