Visualization of Ego2Exo cross-view object correspondence. The examples demonstrate how V²-SAM accurately transfers object masks from first-person to third-person viewpoints under large appearance and viewpoint variations.
Visualization of Exo2Ego cross-view correspondence. V²-SAM effectively handles hand occlusion, object deformation, and strong viewpoint shifts when transferring masks from static third-person views to egocentric frames.
Cross-view object correspondence aims to associate the same object across drastic viewpoint variations (e.g., ego–exo). However, segmentation models such as SAM2 cannot be directly applied, because spatial prompts from the query view do not transfer across views. To address this, V²-SAM adapts SAM2 using two complementary prompt generators:
With a multi-expert framework and PCCS, V²-SAM reliably selects the best expert per instance. It achieves new SOTA on Ego-Exo4D, DAVIS-2017, and HANDAL-X.
Teaser: V²-SAM supports coordinate-point and visualreference prompts for cross-view segmentation.
Cross-view correspondence is crucial for multi-view perception, video understanding, and embodied AI. However, drastic changes in viewpoint, appearance, background clutter, motion, and occlusion make it difficult to directly leverage SAM2 with simple prompts.
This naturally raises two core questions:
V²-SAM provides affirmative answers by combining geometry-aware anchor prompts with appearance-guided visual prompts, enabling reliable cross-view object correspondence.
The V²-SAM framework integrates V²-Anchor (geometry-driven anchor prompting) and V²-Visual (appearance-guided visual prompting), together with three prompt experts and the PCCS selector for robust cross-view segmentation.
We provide a quantitative comparison of V²-SAM with previous state-of-the-art cross-view correspondence methods. The table highlights the strong performance of V²-SAM across Ego–Exo4D, DAVIS-2017, and HANDAL-X benchmarks.
Figure: Quantitative results comparing different methods on multiple benchmarks. V²-SAM achieves new state-of-the-art performance under both single-expert and multi-expert settings.
@article{pan2025v2sam,
title={V²-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence},
author={Pan, Jiancheng and Wang, Runze and Qian, Tianwen and Mahdi, Mohammad and Fu, Yanwei and Xue, Xiangyang and Huang, Xiaomeng and Van Gool, Luc and Paudel, Danda Pani and Fu, Yuqian},
journal={arXiv preprint arXiv:2511.20886},
year={2025}
}