Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This new route at ‘/lipsync’ takes either a simple text input or an audio file along with a static image, producing an mp4 of lipsync’ed audio and video.
An optional parameter return_frames will return single frames following the schema used in the image-to-video pipeline.
If text is supplied instead of an audio file, FastSpeech2Conformer is used for TTS.
The text input and mp4 output options differ from the bounty requirements solely for ease of demo ( and debugging ) purposes and can quickly be removed if desired.
At the time of writing, a demo server is running at http://204.12.245.134:8002/docs#/default/lipsync
( Disclaimer - long audio or text sequences will OOM on the GPU and may not gracefully recover )
Real3DPortrait https://github.com/yerfor/Real3DPortrait is utilized for the audio to video synchronization pipeline, and a purpose built Conda environment is configured on the host - isolating the majority of the requirements.
Standing apart from this majority is one particular requirement that needed to be installed at the OS level. In lieu of bumping the version of our Ubuntu base image 20.04 → 22.04, I’ve created a separate dockerfile which builds the necessary version from source.
Lipsync pipeline specific instructions for running and debugging can be found at cmd/lipsync/README.md
The approach taken here was a bit atypical ( to pull in an entire repo to utilize for a pipeline ), but it was a personal goal was to make some improvement to developer velocity on the AI Pipeline. The changes in this PR establish a pattern that enables devs to test out and prototype new pipelines with and test existing open-source implementations without potentially conflicting or hard-to-resolve dependencies.
Further work would include implementing lower level inference logic from scratch to be able to more finely control model selection and loading/unloading/caching.