Stylizing a dynamic scene based on an exemplar image is critical for various real-world applications, including gaming, filmmaking, and augmented and virtual reality. However, achieving consistent stylization across both spatial and temporal dimensions remains a significant challenge. Most existing methods are designed for static scenes and often require an optimization process for each style image, limiting their adaptability. We introduce ZDySS, a zero-shot stylization framework for dynamic scenes, allowing our model to generalize to previously unseen style images at inference. Our approach employs Gaussian splatting for scene representation, linking each Gaussian to a learned feature vector that renders a feature map for any given view and timestamp. By applying style transfer on the learned feature vectors instead of the rendered feature map, we enhance spatio-temporal consistency across frames. Our method demonstrates superior performance and coherence over state-of-the-art baselines in tests on real-world dynamic scenes, making it a robust solution for practical applications.
基于示例图像对动态场景进行风格化在游戏、电影制作以及增强与虚拟现实等多种实际应用中至关重要。然而,要在空间和时间维度上实现一致的风格化仍是一个重大挑战。现有的大多数方法主要针对静态场景设计,且通常需要为每张风格图像单独进行优化,这限制了其适应性。 我们提出了 ZDySS,一种针对动态场景的零样本风格化框架,使模型在推理时能够泛化到未见过的风格图像。我们的方法采用高斯点绘制(Gaussian Splatting)作为场景表示,将每个高斯与一个可学习的特征向量相连接,以渲染任意视角和时间戳的特征图。通过对学习到的特征向量进行风格迁移,而非直接操作渲染后的特征图,我们显著提升了跨帧的时空一致性。 实验结果表明,在真实动态场景测试中,ZDySS 相较于当前最先进的基线方法表现出更优异的性能和一致性,成为一项适用于实际应用的强大解决方案。