GaussianFeels
Object-centric Gaussian SLAM for visuo-tactile in-hand manipulation
M.S. Thesis · Soft Robotics & Bionics Lab, Seoul National University
When a robot hand grasps an object, it occludes exactly the region it most needs to perceive. GaussianFeels fuses RGB-D vision, DIGIT tactile contact geometry, and hand proprioception into one explicit object-centric 3D Gaussian Splatting map, reconstructing and tracking objects online, through the occlusion, with no CAD model.
Method & results
- One map, every job: a single object-centric 3D Gaussian state serves training, rendering, frozen-map SDF pose tracking, reconstruction evaluation, and manipulation-facing geometry. It replaces the neural-implicit SDF as the shared representation for online, model-free visuo-tactile object SLAM.
- Pose is recovered by a multi-residual Levenberg-Marquardt optimiser solving SE(3) against a frozen dual-sigma Gaussian-density anchor SDF, fusing synchronized RGB-D, tactile, and proprioceptive observations in one canonical frame.
- A frame-zero branch generates a shape estimate from a single RGB crop with an image-to-3D model, then progressively replaces generated geometry with measured geometry as the episode progresses.
- Real time, no CAD model: map and pose modes clear the 25 FPS target; slam mode runs ≈28 FPS in simulation and ≈23.5 FPS on real hardware across the 14-cell FeelSight primary sweep (multi-seed medians).
- Sim-to-real is strong reconstruction transfer with a harder real tracking bottleneck: 94% of simulation F-score@5mm is retained on real hardware (0.946 → 0.888), versus 80% for NeuralFeels (0.898 → 0.716).
- Frame-matched against model-free NeuralFeels: more accurate in simulation (0.91 vs 2.51 mm ADD-S), real-hardware parity (3.34 vs 3.42 mm), at ≈7.6× the mean frame rate (frame-matched protocol, hence the small shift from the 0.83 / 3.37 mm headline medians). The implicit baseline wins only when handed an exact CAD model.
- Paired tactile ablation isolates a domain-dependent finding: tactile improves reconstruction in simulation but degrades it on real hardware (noisy DIGIT depth drags the map), while pose accuracy stays near-neutral in both domains.
- Developed inside Korea's national “Alchemist” humanoid programme (MOTIE), bringing visuo-tactile SLAM from research prototype to the Phase-2 full-scale humanoid.






