VINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene geometry and poses. Based on this output, the mapping module incrementally constructs and maintains a 2D Gaussian map. Key components of the 2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement, which collectively improve mapping speed and localization accuracy. This enables the SLAM system to handle large-scale urban environments with up to 50 million Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS) capabilities of Gaussian Splatting for loop closure detection and correction of the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable presence of dynamic objects in real-world outdoor scenes. Extensive evaluations in indoor and outdoor environments demonstrate that our approach achieves localization performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF SLAM methods. It also significantly outperforms all existing methods in terms of mapping and rendering quality. Furthermore, we developed a mobile app and verified that our framework can generate high-quality Gaussian maps in real time using only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge, VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in outdoor environments and supporting kilometer-scale large scenes.
VINGS-Mono 是一款专为大场景设计的单目(惯性)高斯散点(Gaussian Splatting, GS)SLAM 框架。该框架由四个主要组件组成:VIO 前端、二维高斯地图、NVS 闭环检测 和 动态擦除器。 在 VIO 前端 中,RGB 帧通过稠密捆绑调整和不确定性估计处理,用于提取场景几何和相机位姿。基于这些输出,映射模块逐步构建并维护一个二维高斯地图。二维高斯地图的核心组件包括样本化光栅化器、分数管理器和位姿优化器,这些模块协同工作以提升映射速度和定位精度,使得 SLAM 系统能够处理包含多达 5000 万个高斯椭球的大规模城市环境。 为确保大场景的全局一致性,我们设计了一个 闭环检测模块,创新地利用高斯散点的新视角合成(Novel View Synthesis, NVS)能力进行闭环检测和高斯地图的修正。此外,我们提出了 动态擦除器,以应对真实世界户外场景中动态物体的干扰。 在室内和室外环境中的大量实验表明,我们的方法在定位性能上可与视觉惯性里程计(Visual-Inertial Odometry, VIO)媲美,同时在映射和渲染质量方面显著优于最新的 GS/NeRF SLAM 方法,并在所有现有方法中表现最佳。此外,我们开发了一款移动应用,验证了该框架仅使用智能手机摄像头和低频 IMU 传感器即可实时生成高质量的高斯地图。 据我们所知,VINGS-Mono 是首个能够在户外环境中运行并支持公里级大规模场景的单目高斯 SLAM 方法。