publications

Publications are listed in reverse chronological order. * denotes equal contribution.

2026

  1. arXiv preprint
    video-opd.png
    Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
    Jiaze Li*Hao Yin*, Haoran Xu*, Boshen Xu, Wenhui Tan, and 4 more authors
    In arXiv preprint arXiv:2602.02994, Feb 2026
  2. CVPR 2026
    revisor.png
    REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding
    Jiaze Li*Hao Yin*, Wenhui Tan*, Jingyang Chen*, Boshen Xu, and 5 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Feb 2026

2025

  1. NeurIPS 2025
    cd_rethink.png
    The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs
    Hao Yin, Guangzong Si, and Zilei Wang
    In Advances in Neural Information Processing Systems, Dec 2025
  2. CVPR 2025
    clearsight.png
    ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large language Models
    Hao Yin, Guangzong Si, and Zilei Wang
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2025
  3. CVPR 2025
    himap.png
    Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
    Hao Yin, Guangzong Si, and Zilei Wang
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2025