Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 2.4 KB

2412.16932.md

File metadata and controls

5 lines (3 loc) · 2.4 KB

GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs

Modeling and understanding the 3D world is crucial for various applications, from augmented reality to robotic navigation. Recent advancements based on 3D Gaussian Splatting have integrated semantic information from multi-view images into Gaussian primitives. However, these methods typically require costly per-scene optimization from dense calibrated images, limiting their practicality. In this paper, we consider the new task of generalizable 3D semantic field modeling from sparse, uncalibrated image pairs. Building upon the Splatt3R architecture, we introduce GSemSplat, a framework that learns open-vocabulary semantic representations linked to 3D Gaussians without the need for per-scene optimization, dense image collections or calibration. To ensure effective and reliable learning of semantic features in 3D space, we employ a dual-feature approach that leverages both region-specific and context-aware semantic features as supervision in the 2D space. This allows us to capitalize on their complementary strengths. Experimental results on the ScanNet++ dataset demonstrate the effectiveness and superiority of our approach compared to the traditional scene-specific method. We hope our work will inspire more research into generalizable 3D understanding.

建模和理解三维世界对于各种应用至关重要,从增强现实到机器人导航。最近基于三维高斯点云的进展已经将来自多视图图像的语义信息整合到高斯原语中。然而,这些方法通常需要从密集校准图像中进行昂贵的每场景优化,限制了其实用性。在本文中,我们考虑了从稀疏、未校准的图像对中进行可泛化的三维语义场建模的新任务。基于Splatt3R架构,我们引入了GSemSplat,这是一个无需每场景优化、密集图像集合或校准的框架,能够学习与三维高斯关联的开放词汇语义表示。为了确保在三维空间中有效且可靠地学习语义特征,我们采用了双特征方法,利用区域特定和上下文感知的语义特征作为二维空间中的监督。这使我们能够利用它们的互补优势。在ScanNet++数据集上的实验结果表明,与传统的场景特定方法相比,我们的方法在效果和优越性方面具有显著优势。我们希望我们的工作能激发更多关于可泛化三维理解的研究。