Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

1. Introduce the concept of hyper primitives map.

Hyper primitives are defined as a set of point clouds associated with ORB features, rotation, scaling, ensity, and spherical harmonic (SH) coefficients. The hyper primitives map allows the system to efficiently optimize tracking using a factor graph solver and learn the photorealistic mapping by neural solver.

2. Geometry-based densification.

We argue that 2D geometric feature points spatially distributed in the frames essentially represent the region with a complex texture that requires more hyper primitives. However, less than 30% of 2D geometric feature points of frames are active and have corresponding 3D points, especially for non-RGB-D scenarios. Therefore, we actively create additional temporary hyper primitives based on the inactive 2D feature points.

3. Gaussian-pyramid-based learning, a new progressive training method.

Progressive training is a widely used technology in neural rendering to accelerate the optimization process. Some methods have been proposed to reduce training time while achieving better rendering quality. To enhance performance with efficient multi-level features learning online, we propose Gaussian-pyramid-based learning. At the beginning training step, the hyper primitives are supervised by the highest level of the pyramid, i.e. level n. As training iteration increases, we not only densify hyper primitives but also reduce the pyramid level and obtain a new ground truth until reaching the bottom of the Gaussian pyramid.

@inproceedings{hhuang2024photoslam, title = {Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras}, author = {Huang, Huajian and Li, Longwei and Cheng Hui and Yeung, Sai-Kit}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2024} }

Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

CVPR 2024

Abstract

How it works

Results

Comparisons

Live demos (No Speedup)

Citation

Concurrent Works using 3D Gaussian Splatting