Feat2GS

Probing Visual Foundation Models with Gaussian Splatting

¹Westlake University , ²Max Planck Institute for Intelligent Systems

³University of Tübingen, Tübingen AI Center

Max Planck Institute for Informatics, Saarland Informatics Campus

Motivation

We present Feat2GS, a unified framework to probe “texture and geometry awareness” of visual foundation models. Novel view synthesis serves as an effective proxy for 3D evaluation.

How it works

Casually captured photos are input into visual foundation models (VFMs) to extract features and into a stereo reconstructor to obtain relative poses. Pixel-wise features are transformed into 3D Gaussians (3DGS) using a lightweight readout layer trained with photometric loss. 3DGS parameters, grouped into Geometry and Texture, enable separate analysis of geometry/texture awareness in VFMs, evaluated by novel view synthesis (NVS) quality on diverse, unposed open-world images. We conduct extensive experiments to probe the 3D awareness of several VFMs, and investigate the ingredients that lead to a 3D aware VFM. Building on these findings, we develop several variants that achieve SOTA across diverse datasets. This makes Feat2GS useful for probing VFMs, and as a simple-yet-effective baseline for NVS.

Video

Average for Novel View Synthesis across six datasets

3D Models
2D Models

Novel View Synthesis aligns well with Pointcloud Error Map

RADIO
SD

Geometry Probing

RADIO
SD

Texture Probing

RADIO
IUVRGB

All=Geometry+Texture Probing

RADIO
SD

Application

Acknowledgement

This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (EmmyNoether Programme, project: Real Virtual Humans), and German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. We thank Yuxuan Xue, Vladimir Guzov, Garvita Tiwari for their valuable feedback, and the members of Endless AI Lab and Real Virtual Humans for their help and discussions. Yuliang Xiu has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No.860768 (CLIPE project). Yue Chen and Xingyu Chen are supported by the Research Center for Industries of the Future (RCIF) at Westlake University, and the Westlake Education Foundation. Gerard Pons-Moll is a Professor at the University of Tübingen endowed by the Carl Zeiss Foundation, at the Department of Computer Science and a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645.

BibTeX citation

    @article{chen2024feat2gs,
  title={Feat2GS: Probing Visual Foundation Models with Gaussian Splatting},
  author={Chen, Yue and Chen, Xingyu and Chen, Anpei and Pons-Moll, Gerard and Xiu, Yuliang},
  journal={arXiv preprint arXiv:2412.09606},
  year={2024}
}