Abstract
Although supervised multi-view 3D reconstruction methods have achieved satisfying performance recently, there are major limitations such as high costs for 3D data collection and poor generalization to unseen scenes. Hence, unsupervised 3D reconstruction approaches based on photometric consistency are being explored. However, variations in lighting conditions among different views and reflective surfaces within a scene can undermine the reliability of these approaches. In this paper, we propose adaptive depth priors as pseudo-labels to guide the optimization process of self-supervised multi-view stereo. First, sparse depth priors are generated based on the conventional structure from motion (SfM) and multi-view stereo (MVS) algorithms, which are then fed into a monocular depth estimation network to learn the adapted depth priors. Besides, a spatial-frequency fusion structure is designed to enhance global perception in the feature matching of MVS by combining local dependency from spatial domain with global contextual information in the frequency domain. Extensive experiments on DTU and Tanks \& Temples datasets demonstrate that the proposed ADP-MVSNet achieves markedly improved results over the existing unsupervised approaches and even outperforms some supervised methods.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Conference on Image Processing (ICIP) |
Publisher | IEEE |
DOIs | |
Publication status | E-pub ahead of print - 27 Sept 2024 |