how is the progress？ #20

fenghe12 · 2024-09-24T03:06:05Z

we collected some face videos data and constructed a new dataset(named FaceVid-1K)，maybe available in next month.

johndpope · 2024-10-30T11:50:26Z

im very active to build out IMF neural video codec - it is another microsoft paper - (not stable diffusion)
https://github.com/johndpope/IMF/branches
i got the model working / training - in a way - it's superior to megaportraits - no keypoints / no warping.
it's decoder entric. have a read of paper - it's quite intriguing - and ive been able to plug / upgrade different modules to make it better.

here's the training for IMF
https://wandb.ai/snoozie/IMF/runs/xscj3hjo?nw=nwusersnoozie

the reconstructed image is 32 floats with some stylegan modulation. its very light weight.
its working - but im struggling to get into onto any client (wasm / ios / onnx ...) without breaking the model or degrading to unusable.

hopefully google can fix this -
google-ai-edge/ai-edge-torch#305

https://github.com/AlexanderLutsenko/nobuco
there's this library to convert pytorch to tensorflowjs - but this is a real head ache because tensorflow uses BHWC - and pytorch uses BCWH so all the logic is flip flopped around.

woking this paper - the performance of VASA is kinda unique.
I almost exhaust IMF now - and circle back to tack another look at this paper with fresh eyes.

UPDATE

i dump a bunch of fresh code from claude -
plan is to get the dataset working / validating.....
and then wire up the training .
https://github.com/johndpope/VASA-1-hack/blob/main/dataset_testing.py

theres some flux here in code models
I need to adjust code to use yaml configs / accelerate.
https://github.com/johndpope/VASA-1-hack/blob/main/train.py#L746

UPDATE -
i add a VASADatasetTester -
the dataset / testing emotion detection
it's running and passing

UPDATE

i cherry pick the models from megaportraits to do the encoding with stage 1

python train_stage_1.py

but hitting OOM - dont remember this being broken - have to debug https://github.com/johndpope/MegaPortrait-hack

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 23.59 GiB of which 173.75 MiB is free. Process 4093651 has 262.20 MiB memory in use. Including non-PyTorch memory, this process has 20.43 GiB memory in use. Of the allocated memory 20.02 GiB is allocated by PyTorch, and 38.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Update - Monday 5th November
I partially solve memory problem
https://wandb.ai/snoozie/megaportraits/overview

Its training stage 1 - here - overfitting example
It has updated code to do warping.

Next step - attempt to import this into stage 2.
There’s a million more things to do for stage 1 to match megaportrait paper
They have high res / distillation training teacher / student - notably missing
The emoportraits has many losses can also add.

but I’m more interested to test out my latest vasa motion generator code.

Update Nov 19th

So I abandon the megaportraits code / logic and cherry pick the emo portraits volumetric avatar
this is SOTA albeit crippled with creative commons license -
I have some code that is not released (i dont want my code tainted with CC)

i take 10 videos - run them through the volumetric feature extractor
for each window

Canonical Volume: [B, T, C, D, H, W] = [1, 50, 96, 16, 64, 64]
Size = 1 * 50 * 96 * 16 * 64 * 64 * 4 bytes (float32) ≈ 100MB per window

ID Embed: We only save one per video, so this is negligible

For a 5 second video at 30fps = 150 frames:

Number of windows = (150 - 50) / 25 + 1 = 5 windows (due to 50% overlap)
Total data per video = 5 windows * 100MB = ~500MB

to get this into the diffusion transformer - i hit OOM errors - and basically hit a wall with 3090 gpu.
i had a rethink - to extract the stage 1 features up front and save to h5 file.
I'm gob smacked how much data is necessary to store this.
looking to tweak this somehow before attempting stage 2 training again.

This was referenced Nov 5, 2024

I would like to ask if it’s realistic for training one epoch on a single video to take around 3 minutes. Is this accurate? johndpope/MegaPortrait-hack#58

Open

Feat/warp fix johndpope/MegaPortrait-hack#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how is the progress？ #20

how is the progress？ #20

fenghe12 commented Sep 24, 2024

johndpope commented Oct 30, 2024 •

edited

Loading

how is the progress？ #20

how is the progress？ #20

Comments

fenghe12 commented Sep 24, 2024

johndpope commented Oct 30, 2024 • edited Loading

johndpope commented Oct 30, 2024 •

edited

Loading