This folder contains interesting usages of using MMagic and SAM together.
We first create a conda env, and then install MMagic and SAM in it.
# create env and install torch
conda create -n mmedit-sam python=3.8 -y
conda activate mmedit-sam
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# install mmagic
pip install openmim
mim install mmengine "mmcv>=2.0.0"
git clone -b dev-1.x https://github.com/open-mmlab/mmagic.git
pip install -e ./mmagic
# install sam
pip install git+https://github.com/facebookresearch/segment-anything.git
# you may need ffmpeg to get frames or make video
sudo apt install ffmpeg
Download SAM checkpoints.
mkdir -p checkpoints/sam
wget -O checkpoints/sam/sam_vit_h_4b8939.pth https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
Instruction
Find a video clip that you want to edit with and get frames.
mkdir -p inputs/demo_video
ffmpeg -i your_video.mp4 inputs/demo_video/%04d.jpg
Run the script.
python play_controlnet_animation_sam.py
Make video with output frames.
ffmpeg -r 10 -i results/final_frames/%04d.jpg -b:v 30M -vf fps=10 results/final_frames.mp4
output example
Below is a video input and output result for example. Try to make your new videos!
huangbo_fps10_playground_party_fixloc_cat.mp4
Method explanation
We get the final video through the following steps:
-
Split the input video into frames
-
Call the controlnet animation model through the inference API of MMagic to modify each frame of the video to make it an AI animation
-
Use the stable diffusion in MMagic to generate a background image that matches the semantics of the animation content
-
Use SAM to predict the mask of the person in the animation
-
Replace the background in the animation with the background image we generated