Skip to content

AlonzoLeeeooo/awesome-video-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

A Collection of Video Generation Studies

This GitHub repository summarizes papers and resources related to the video generation task.

If you have any suggestions about this repository, please feel free to start a new issue or pull requests.

Recent news of this GitHub repo are listed as follows.

🔥 [Nov. 19th] We have released our latest paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing", with the correponding code, model weights, and a testing benchmark DAVIS-Edit open-sourced. Feel free to check them out from the links!

Click to see more information.
  • [Jun. 17th] All NeurIPS 2023 papers and references are updated.
  • [Apr. 26th] Update a new direction: Personalized Video Generation.
  • [Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.

Contents

To-Do Lists

  • Latest Papers
    • Update NeurIPS 2024 Papers
    • Update ECCV 2024 Papers
    • Update CVPR 2024 Papers
      • Update PDFs and References of ⚠️ Papers
      • Update Published Versions of References
    • Update AAAI 2024 Papers
      • Update PDFs and References of ⚠️ Papers
      • Update Published Versions of References
    • Update ICLR 2024 Papers
    • Update NeurIPS 2023 Papers
  • Previously Published Papers
    • Update Previous CVPR papers
    • Update Previous ICCV papers
    • Update Previous ECCV papers
    • Update Previous NeurIPS papers
    • Update Previous ICLR papers
    • Update Previous AAAI papers
    • Update Previous ACM MM papers
  • Regular Maintenance of Preprint arXiv Papers and Missed Papers

<🎯Back to Top>

Products

Name Organization Year Research Paper Website Specialties
Sora OpenAI 2024 link link -
Lumiere Google 2024 link link -
VideoPoet Google 2023 - link -
W.A.I.T Google 2023 link link -
Gen-2 Runaway 2023 - link -
Gen-1 Runaway 2023 - link -
Animate Anyone Alibaba 2023 link link -
Outfit Anyone Alibaba 2023 - link -
Stable Video StabilityAI 2023 link link -
Pixeling HiDream.ai 2023 - link -
DomoAI DomoAI 2023 - link -
Emu Meta 2023 link link -
Genmo Genmo 2023 - link -
NeverEnds NeverEnds 2023 - link -
Moonvalley Moonvalley 2023 - link -
Morph Studio Morph 2023 - link -
Pika Pika 2023 - link -
PixelDance ByteDance 2023 link link -

<🎯Back to Top>

Papers

Survey Papers

  • Year 2024
  • arXiv
    • Video Diffusion Models: A Survey [Paper]
  • Year 2023
  • arXiv
    • A Survey on Video Diffusion Models [Paper]

Text-to-Video Generation

  • Year 2024
    • CVPR
      • Vlogger: Make Your Dream A Vlog [Paper] [Code]
      • Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]
      • VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]
      • GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]
      • SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]
      • MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]
      • Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]
      • PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]
      • EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]
      • A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]
      • BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]
      • Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]
      • Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]
      • MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]
      • Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [Paper] [Project]
      • DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [Paper] [Code]
      • Grid Diffusion Models for Text-to-Video Generation [Paper] [Code] [Video]
    • ECCV
      • Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning [Paper] [Project]
      • W.A.L.T.: Photorealistic Video Generation with Diffusion Models [Paper] [Project]
      • MoVideo: Motion-Aware Video Generation with Diffusion Models [Paper]
      • DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model [Paper] [Code] [Project]
      • MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing [Paper]
      • HARIVO: Harnessing Text-to-Image Models for Video Generation [Paper] [Project]
      • MEVG: Multi-event Video Generation with Text-to-Video Models [Paper] [Project]
      • DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency [Paper]
      • SAVE: Protagonist Diversification with Structure Agnostic Video Editing
    • ICLR
      • VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]
      • VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]
    • AAAI
      • Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]
      • E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]
      • ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]
      • F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]
    • arXiv
      • Lumiere: A Space-Time Diffusion Model for Video Generation [Paper] [Project]
      • Boximator: Generating Rich and Controllable Motions for Video Synthesis [Paper] [Project] [Video]
      • World Model on Million-Length Video And Language With RingAttention [Paper] [Code] [Project]
      • Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [Paper] [Project]
      • WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Paper] [Code] [Project]
      • MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper] [Project]
      • Latte: Latent Diffusion Transformer for Video Generation [Paper] [Code] [Project]
      • Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Paper] [Code]
      • StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [Paper] [Code] [Project] [Video]
      • VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [Paper]
      • StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [Paper] [Code] [Project] [Demo]
      • Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code] [Project]
      • ControlNeXt: Powerful and Efficient Control for Image and Video Generation [Paper] [Code] [Project]
      • FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance [Paper] [Project]
      • Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data [Paper] [Code]
      • Fine-gained Zero-shot Video Sampling [Paper] [Project]
      • Training-free Long Video Generation with Chain of Diffusion Model Experts [Paper]
      • ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [Paper] [Code] [Project] [Video]
      • ConFiner: Training-free Long Video Generation with Chain of Diffusion Model Experts [Paper] [Code]
    • Others
      • Sora: Video Generation Models as World Simulators [Paper]
  • Year 2023
  • Year 2022
  • Year 2021

Image-to-Video Generation

  • Year 2024

    • CVPR
    • ECCV
      • Rethinking Image-to-Video Adaptation: An Object-centric Perspective [Paper]
      • PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation [Paper] [Code] [Project]
    • AAAI
      • Decouple Content and Motion for Conditional Image-to-Video Generation [Paper]
    • arXiv
  • Year 2023

    • CVPR
      • Conditional Image-to-Video Generation with Latent Flow Diffusion Models [Paper] [Code]
    • arXiv
  • Year 2022

    • CVPR
      • Make It Move: Controllable Image-to-Video Generation with Text Descriptions [Paper] [Code]
  • Year 2021

    • ICCV
      • Click to Move: Controlling Video Generation with Sparse Motion [Paper] [Code]

<🎯Back to Top>

Audio-to-Video Generation

  • Year 2024
    • AAAI
      • Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation [Paper] [Code]
  • Year 2023
    • CVPR
      • MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation [Paper] [Code]

<🎯Back to Top>

Personalized Video Generation

  • Year 2024
  • Year 2023
    • arXiv
      • FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention [Paper] [Code] [Demo]
      • Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance [Paper] [Project]
      • DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control [Paper] [Project]

<🎯Back to Top>

Video Editing

<🎯Back to Top>

Datasets

  • [arXiv 2012] UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild [Paper] [Dataset]
  • [arXiv 2017] DAVIS: The 2017 DAVIS Challenge on Video Object Segmentation [Paper] [Dataset]
  • [ICCV 2019] FaceForensics++: Learning to Detect Manipulated Facial Images [Paper] [Code]
  • [NeurIPS 2019] TaiChi-HD: First Order Motion Model for Image Animation [Paper] [Dataset]
  • [ECCV 2020] SkyTimeLapse: DTVNet: Dynamic Time-lapse Video Generation via Single Still Image [Paper] [Code]
  • [ICCV 2021] WebVid-10M: Frozen in Time: ️A Joint Video and Image Encoder for End to End Retrieval [Paper] [Dataset] [Code] [Project]
  • [ICCV 2021] WebVid-10M: Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [Paper] [Dataset] [Project]
  • [ECCV 2022] ROS: Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining [Paper] [Code] [Dataset]
  • [arXiv 2023] HD-VG-130M: VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-video Generation [Paper] [Dataset]
  • [NeurIPS 2023] FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation [Paper] [Code]
  • [ICLR 2024] InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation [Paper] [Dataset]
  • [CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers [Paper] [Dataset] [Project]
  • [arXiv 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models [Paper] [Dataset]

<🎯Back to Top>

Evaluation Metrics

  • [CVPR 2024] VBench: Comprehensive Benchmark Suite for Video Generative Models [Paper] [Code]
  • [ICCV 2023] DOVER: Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives [Paper] [Code]
  • [ICLR 2019] FVD: A New Metric for Video Generation [Paper] [Code]

Q&A

  • Q: The conference sequence of this paper list?
    • This paper list is organized according to the following sequence:
      • CVPR
      • ICCV
      • ECCV
      • NeurIPS
      • ICLR
      • AAAI
      • ACM MM
      • SIGGRAPH
      • arXiv
      • Others
  • Q: What does Others refers to?
    • Some of the following studies (e.g., Sora) does not publish their technical report on arXiv. Instead, they tend to write a blog in their official websites. The Others category refers to such kind of studies.

<🎯Back to Top>

References

The reference.bib file summarizes bibtex references of up-to-date image inpainting papers, widely used datasets, and toolkits. Based on the original references, I have made the following modifications to make their results look nice in the LaTeX manuscripts:

  • Refereces are normally constructed in the form of author-etal-year-nickname. Particularly, references of datasets and toolkits are directly constructed as nickname, e.g., imagenet.
  • In each reference, all names of conferences/journals are converted into abbreviations, e.g., Computer Vision and Pattern Recognition -> CVPR.
  • The url, doi, publisher, organization, editor, series in all references are removed.
  • The pages of all references are added if they are missing.
  • All paper names are in title case. Besides, I have added an additional {} to make sure that the title case would also work well in some particular templates.

If you have other demands of reference formats, you may refer to the original references of papers by searching their names in DBLP or Google Scholar.

<🎯Back to Top>

Star History

Star History Chart

<🎯Back to Top>