3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GaussianVideo, an approach to learning a set of 2D Gaussian splats that can effectively represent video frames. GaussianVideo incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process. Experiment results show that GaussianVideo achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and VVC, and a rendering speed of 1500 fps for a 1920x1080 video.
3D 高斯点已成为静态 3D 场景的一种革命性且高效的学习表示方法。在本工作中,我们探索使用 2D 高斯点作为表示视频的新基本单元,并提出了一种名为 GaussianVideo 的方法,用于学习一组能够有效表示视频帧的 2D 高斯点。GaussianVideo 包括以下技术:(i)通过基于前一帧预测当前帧的高斯点来利用相邻帧之间的时间冗余,这可以加速训练并提高压缩效率;(ii)通过移除对视频质量贡献较低的高斯点来控制文件大小与质量的权衡;(iii)通过随机添加高斯点来适应具有大运动量或新出现物体的内容,以捕捉视频中的动态变化;(iv)在学习过程中根据损失差异检测关键帧,以处理场景的显著变化。实验结果表明,GaussianVideo 在率失真(rate-distortion)权衡方面达到了与最先进的视频编解码器(如 AV1 和 VVC)相当的水平,并且在 1920x1080 视频上的渲染速度达到了 1500 帧每秒(fps)。