Knowledge-Intensive Multimodal Reasoning
ICCV 2025, October 19 2025, Hawaii

About The Workshop

This workshop aims to advance the frontier of multimodal AI systems that can effectively reason across specialized domains requiring extensive domain knowledge. Recent advancements in multimodal AI—combining information from text, images, audio, and structured data—have unlocked impressive capabilities in general-purpose reasoning. However, significant challenges persist when these systems encounter scenarios demanding deep domain expertise in fields such as medicine, engineering, and scientific research. Such contexts require expert-level perception and reasoning grounded in extensive subject knowledge, highlighting the need for specialized strategies to handle domain-specific complexity. Through invited talks, panel discussions, and interactive poster sessions, researchers and practitioners from diverse backgrounds will share the latest developments, ongoing hurdles, and promising future directions for knowledge-intensive multimodal reasoning. The workshop aims to foster collaboration and stimulate innovation towards the development of next-generation multimodal AI systems capable of reliable, transparent, and contextually grounded reasoning in specialized, high-stakes environments.

Schedule

Time (HST) Session Speaker Talk Title
1:00 – 1:10 Opening Remarks
1:10 – 1:40 Invited Talk 1 Yongming Rao (Tencent Hunyuan) Interactive Multi-Modal Reasoning with Thinking-on-Image Reasoning
1:40 – 2:10 Invited Talk 2 Kang-Fu Mei (Google DeepMind) The Power of Context: How Multimodality Reasoning Improves Image Generation
2:10 – 3:10 Poster Session
3:10 – 3:20 Contributed Talk 1 Enxin Song (UCSD & ZJU) Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
3:20 – 3:30 Contributed Talk 2 Ziqi Huang (NTU) VChain: Chain-of-Visual-Thought for Reasoning in Video Generation
3:30 – 4:00 Invited Talk 3 Ziwei Liu (NTU) Native Multimodal Models: Architecture, Post-Training, and Evaluation
4:00 – 4:30 Invited Talk 4 David Fan (Meta FAIR) Scaling Language-Free Visual Representation Learning
4:30 – 4:40 Short break
4:40 – 4:50 Contributed Talk 3 Shoubin Yu (UNC) SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
4:50 – 5:00 Contributed Talk 4 Zeyuan Yang (UMass) Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
5:00 – 5:30 Panel Session
5:30 – 5:40 Paper Award & Closing Remarks

Speakers and Panelists

Ziwei Liu
Ziwei Liu

NTU

Kang-Fu Mei
Kang-Fu Mei

Google DeepMind

David Fan
David Fan

Meta FAIR

Yongming Rao
Yongming Rao

Tencent Hunyuan

Ranjay Krishna
Ranjay Krishna

UW & AI2

Manling Li
Manling Li

Northwestern



Topics

The workshop will cover a range of topics, including but not limited to:

Knowledge-intensive Multimodal Learning

This topic focuses on method and architecture designs that integrate domain-specific knowledge with diverse data sources (e.g., text, images, sensor data, and structured data) across specialized fields. We will cover data curation strategies, modality fusion techniques, representation learning frameworks, and explainability methods aimed at ensuring that models capture the domain knowledge crucial for reliable reasoning in high-stakes settings.


Multimodal Foundation Models for Specialized Domains

This topic investigates how to adapt large-scale and general-purpose multimodal foundation models for domains where specialized expertise is essential, such as clinical diagnostics, scientific research, and advanced engineering applications. We will cover strategies for efficient fine-tuning, prompt engineering, domain-centric pre-training, and knowledge distillation to blend foundational capabilities with expert-level insights.


Embodied AI for Knowledge-Intensive Scenarios

This topic explores the integration of multimodal reasoning in physical or interactive domains, ranging from industrial automation to laboratory robotics. Key discussion points include sensor fusion, adaptive learning with minimal supervision, human-robot collaboration, and simulation-to-real transfer in safety-critical scenarios. Emphasis will be placed on how advanced reasoning techniques—grounded in specialized domain knowledge—can help ensure transparency, robustness, and trustworthiness in embodied AI systems.


Evaluation and Benchmarking

Robust evaluation protocols and benchmarks are essential for gauging progress and ensuring the reliability of domain-specific multimodal AI. We will cover the development of standardized benchmarks, performance metrics, and testing methodologies designed to capture the full spectrum of specialized domain challenges for multimodal reasoning.


Broader Topics in Knowledge-Intensive Multimodal Reasoning

In addition to the core themes above, our discussions will expand to emerging areas such as integrating symbolic and neural methods for structured reasoning, ensuring privacy and security with sensitive data, exploring multi-agent collaboration for complex decision-making, and examining societal and ethical considerations when deploying multimodal systems in real-world, high-stakes environments.



Accepted Papers

    Call For Papers

    Key Dates

    • Submission Deadline: August 5, 2025 August 22, 2025 (AOE) - Extended to facilitate EMNLP submission papers
    • Notification: August 25, 2025 (AOE) September 2, 2025
    Deadlines are strict and will not be extended under any circumstances. All deadlines follow the Anywhere on Earth (AoE) timezone.

    Submission Site

    Submissions are managed via OpenReview. Papers will remain private during the review process. All authors must maintain up-to-date OpenReview profiles to ensure proper conflict-of-interest management and paper matching. Incomplete profiles may result in desk rejection.

    Submission Format

    Papers are limited to eight pages, including figures and tables, in the KnowledgeMR Workshop Latex Template (adopted from the ICCV 2025 template). Additional pages containing cited references and appendix are allowed. Papers that are not properly anonymized, or do not use the template, or have more than eight pages (excluding references and appendix) will be rejected without review.

    Anonymity

    Double blind review: Our reviewing is double blind, in that authors do not know the names of the area chairs or reviewers for their papers, and the area chairs/reviewers cannot, beyond a reasonable doubt, infer the names of the authors from the submission and the additional material.

    Dual Submission and Non-Archival Policy

    Submissions under review at other venues will be accepted, provided they do not breach any dual-submission or anonymity policies of those venues. Submissions will not be indexed or have archival proceedings. We welcome ICCV 25, EMNLP 25, ICLR 26, AAAI 26 submissions.



    Student Registration Grant

    We are excited to offer a limited number of free full conference, "student early" registrations for ICCV 2025, exclusively for full-time students attending in person. This initiative aims to support early-career researchers while fostering diversity, equity, and inclusion (DEI) in the academic community.

    Selection Criteria

    Applications will be evaluated based on the strength of the submitted materials (see details below). Priority will be given to students presenting papers at our workshop who lack alternative travel support.

    How to Apply

    Interested students must complete the application form here by 11:59pm (AoE) on August 26, 2025, which includes the following:

    • Personal & Academic Details: Name, affiliation, and relevant academic information
    • CV/Resume
    • Paper ID: Accepted or submitted to our workshop
    • Statement of Interest: A brief paragraph explaining how this opportunity will benefit your research and career
    • Attendance Confirmation: A clear statement confirming that you will attend in person

    Important Notes

    • Awardees will be announced on September 3, 2025
    • If you have already registered, please submit your receipt, and we will provide further instructions
    • Travel and accommodations must be arranged independently; this grant covers registration only

    This opportunity is highly competitive, and we encourage all eligible students to apply early.

    Organizers

    This workshop is organized by

    Xiangliang Zhang
    Xiangliang Zhang

    Notre Dame

    Manling Li
    Manling Li

    Northwestern

    Yapeng Tian
    Yapeng Tian

    UT Dallas

    Minhao Cheng
    Minhao Cheng

    Penn State

    Wenhu Chen
    Wenhu Chen

    UWaterloo

    Yilun Zhao
    Yilun Zhao

    Yale

    Haowei Zhang
    Haowei Zhang

    Fudan University

    Tianyu Yang
    Tianyu Yang

    Notre Dame

    Zhenting Qi
    Zhenting Qi

    Harvard

    Yuyang Liu
    Yuyang Liu

    Peking University

    Simeng Han
    Simeng Han

    Yale

    Rui Xiao
    Rui Xiao

    TUM

    Fan Nie
    Fan Nie

    Stanford

    Sponsors

    We welcome sponsorship opportunities. To become a sponsor, please contact us (Yilun Zhao: yilun.zhao@yale.edu).