image V²-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence

ArXiv 2025


1INSAIT, Sofia University “St. Kliment Ohridski”, 2Tsinghua University, 3Fudan University, 4East China Normal University

Ego2Exo Correspondence Visualization

Ego2Exo Visualization

Visualization of Ego2Exo cross-view object correspondence. The examples demonstrate how V²-SAM accurately transfers object masks from first-person to third-person viewpoints under large appearance and viewpoint variations.

Exo2Ego Correspondence Visualization

Exo2Ego Visualization

Visualization of Exo2Ego cross-view correspondence. V²-SAM effectively handles hand occlusion, object deformation, and strong viewpoint shifts when transferring masks from static third-person views to egocentric frames.

Abstract

Cross-view object correspondence aims to associate the same object across drastic viewpoint variations (e.g., ego–exo). However, segmentation models such as SAM2 cannot be directly applied, because spatial prompts from the query view do not transfer across views. To address this, V²-SAM adapts SAM2 using two complementary prompt generators:

  • V²-Anchor: geometry-aware anchoring based on DINOv3, enabling coordinate-based prompting.
  • V²-Visual: appearance-guided prompting via a Visual Prompt Matcher.

With a multi-expert framework and PCCS, V²-SAM reliably selects the best expert per instance. It achieves new SOTA on Ego-Exo4D, DAVIS-2017, and HANDAL-X.

Teaser Image

Teaser: V²-SAM supports coordinate-point and visualreference prompts for cross-view segmentation.

Motivation

Cross-view correspondence is crucial for multi-view perception, video understanding, and embodied AI. However, drastic changes in viewpoint, appearance, background clutter, motion, and occlusion make it difficult to directly leverage SAM2 with simple prompts.

This naturally raises two core questions:

  • Can SAM2’s spatial prompting be restored in cross-view scenarios?
  • Can spatial prompts and visual prompts complement each other to improve robustness?

V²-SAM provides affirmative answers by combining geometry-aware anchor prompts with appearance-guided visual prompts, enabling reliable cross-view object correspondence.

Contributions

  • Unified Cross-View Framework — We present the first framework that adapts SAM2 to cross-view object correspondence through complementary spatial and visual prompts.
  • Cross-view Anchor Prompt (V²-Anchor) — A geometry-aware module based on DINOv3, enabling coordinate-based prompting for SAM2 under drastic viewpoint changes.
  • Cross-view Visual Prompt (V²-Visual) — A novel Visual Prompt Matcher that aligns cross-view representations from both feature and structural perspectives.
  • Multi-Prompt Experts + PCCS — Three complementary experts (Anchor / Visual / Fusion) and a cyclic consistency-based selector that adaptively chooses the most reliable prediction.
  • Extensive Benchmarking — V²-SAM achieves new SOTA on:
    • Ego-Exo4D
    • DAVIS-2017 (video tracking)
    • HANDAL-X (robotic cross-view transfer)

V²-SAM Framework

V²-SAM Framework

The V²-SAM framework integrates V²-Anchor (geometry-driven anchor prompting) and V²-Visual (appearance-guided visual prompting), together with three prompt experts and the PCCS selector for robust cross-view segmentation.

Main Results

We provide a quantitative comparison of V²-SAM with previous state-of-the-art cross-view correspondence methods. The table highlights the strong performance of V²-SAM across Ego–Exo4D, DAVIS-2017, and HANDAL-X benchmarks.

Experimental Table Figure

Figure: Quantitative results comparing different methods on multiple benchmarks. V²-SAM achieves new state-of-the-art performance under both single-expert and multi-expert settings.

Citation


@article{pan2025v2sam,
  title={V²-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence},
  author={Pan, Jiancheng and Wang, Runze and Qian, Tianwen and Mahdi, Mohammad and Fu, Yanwei and Xue, Xiangyang and Huang, Xiaomeng and Van Gool, Luc and Paudel, Danda Pani and Fu, Yuqian},
  journal={arXiv preprint arXiv:2511.20886},
  year={2025}
}