International Workshop on Multimodal Generative Search and Recommendation
CIKM 2025
COEX, SEOUL, KOREA

About the MMGenSR

The rapid advancements in generative Artificial Intelligence (AI) have ignited a revolutionary wave across information retrieval and recommender systems. The MMGenSR workshop serves as a premier interdisciplinary platform to explore how generative models, particularly Large Language Models (LLMs) and Large Multimodal Models (LMMs), are transforming both multimodal search and recommendation paradigms.

We aim to bring together researchers and practitioners to discuss innovative architectures, methodologies, and evaluation strategies for generative document retrieval, generative image retrieval, grounded answer generation, generative recommendation, and other multimodal tasks.

The workshop will foster discussions on improving algorithms, generating personalized content, evolving user-system interactions, enhancing trustworthiness, and refining evaluation methodologies for these cutting-edge systems. This timely workshop seeks to identify promising future research directions, address key challenges, and catalyze collaborations towards the development of next-generation intelligent systems.

Workshop Programme

Activity type Time Title
Welcome & Opening remarks 09:00am-09:10am
Keynote 1 09:10am-09:40am Smarter Retrieval for Smarter Generation--When and How to Retrieve for Retrieval-Augmented Generation
Oral presentations
(4 papers, 15 minutes each)
09:40am-10:40am 09:40-09:55: Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation
09:55-10:10: Rank-Aware Indigo-DPO: Scalable Preference Optimization for Industrial Talent Search Ranking
10:10-10:25: Turning Adversaries into Allies: Reversing Typographic Attacks for Multimodal Product Retrieval
10:25-10:40. Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
Coffee break 10:40am-11:00am
Keynote 2 11:00am-11:30am Large-scale Generative and Multimodal Recommendation Systems: An Overview
Lightning spotlight presentations
(6 papers, 8 minutes each)
11:30am-12:20pm 11:30-11:38: V-Agent: An Interactive Video Search System Using Vision-Language Models
11:38-11:46: EcomCLIP: Leveraging multimodal models for generating semantic embeddings
11:46-11:54: XX-Qwen-OmniEmbed: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video
11:54-12:02: TG-S&P: Time-Series Data Generation for Improved Fashion Demand Forecasting
12:02-12:10: Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion
12:10-12:18: SARCH: Multimodal Search for Archaeological Archives

Invited Speakers

Junwei Pan

Junwei Pan

Tencent

Large-scale Generative and Multimodal Recommendation Systems: An Overview

Abstract
This keynote surveys the rapidly evolving landscape of generative and multimodal recommendation systems in large-scale industrial applications. I will first highlight recent advances that incorporate action tokens into generative models, exemplified by HSTU, PinRec, and GenRank. I will then discuss approaches that integrate LLM/VLM knowledge into recommenders, including representation alignment, distance transfer, and semantic ID learning. Across both directions, I will outline the key challenges in real-world deployment—data sparsity, feature heterogeneity, and latency constraints—and summarize emerging solutions and open problems. The goal of this talk is to shed light on the future evolution of generative and multimodal recommendation systems and their role in shaping the next generation of intelligent, foundation-level recommender architectures.

Bio
Dr. Junwei Pan is a Research Scientist in the Tencent Ads Science team. Before joining Tencent, he was a Principal Research Engineer at Yahoo Labs and Yahoo Research where he worked on news personalization, search relevance, and demand-side platform. His research interests lie in computational advertising, recommender systems, and LLM for recommendation. He has published over 30 papers in top-tier conferences like SIGKDD, WWW, ICML, ICLR, NeurIPS, SIGIR, AAAI, CIKM, etc.

Keping Bi

Keping Bi

Chinese Academy of Sciences (CAS)

Smarter Retrieval for Smarter Generation--When and How to Retrieve for Retrieval-Augmented Generation

Abstract
Retrieval-Augmented Generation (RAG) has emerged as a core paradigm for integrating external knowledge into large language models (LLMs), helping to mitigate hallucinations and compensate for outdated or missing information. However, retrieval introduces additional computational overhead due to longer input contexts and does not always improve generation quality—particularly when the retrieved content is irrelevant or low quality. In this talk, I will discuss when retrieval should be triggered—only when the LLM lacks sufficient internal knowledge—and how retrieval can be optimized to better complement generation. I will present recent advances on enhancing LLMs’ perception of their own knowledge boundaries, leveraging LLM-based retrievers to improve retrieval quality, and adopting utility-aware retrieval strategies that prioritize information most beneficial for downstream generation. Together, these directions aim to reduce unnecessary retrieval overhead and provide more effective supporting evidence for reliable and efficient knowledge-augmented generation.

Bio
Keping Bi is an Associate Professor at the Institute of Computing Technology, Chinese Academy of Sciences. She received her Ph.D. from the Center for Intelligent Information Retrieval at the University of Massachusetts Amherst. Her research focuses on retrieval-augmented AI, universal text and multimodal representation learning, and large language model alignment guided by the Honest, Helpful, and Harmless (3H) principle. She currently serves as General Co-Chair of SIGIR-AP 2025 and as an Editor of SIGIR Forum, and has previously served as Registration Chair of SIGIR-AP 2023, Tutorial Chair of NLPCC 2025, and Program Committee or Senior Program Committee member for numerous leading IR and NLP conferences. In addition to her academic experience, she also held full-time industry positions at Baidu (China) and Microsoft (U.S.).

Contributions

  • Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation [Paper Link]
    Chanyoung Chung, Kyeongryul Lee, Sunbin Park and Joyce Whang
  • Rank-Aware Indigo-DPO: Scalable Preference Optimization for Industrial Talent Search Ranking [Paper Link]
    Dingxian Wang, Tong Zhang, Xiang Wang, Mahdi Feroze, Ivan Portyanko, Frank Yang, Spyridon Kapnisis and Andrew Rabinovich
  • Turning Adversaries into Allies: Reversing Typographic Attacks for Multimodal Product Retrieval
    Janet Jenq and Hongda Shen
  • Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
    Wongyu Kim, Hochang Lee, Sanghak Lee, Yoonsung Kim and Jaehyun Park
  • V-Agent: An Interactive Video Search System Using Vision-Language Models
    Sunyoung Park, Jong-Hyeon Lee, Youngjune Kim, Daegyu Sung, Younghyun Yu, Young-Rok Cha and Jeongho Ju
  • EcomCLIP: Leveraging multimodal models for generating semantic embeddings [Paper Link]
    Omkar Gurjar, Kin Sum Liu, Praveen Kolli, Utsaw Kumar and Mandar Rahurkar
  • XX-Qwen-OmniEmbed: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video [Paper Link]
    Mengyao Xu, Wenfei Zhou, Yauhen Babakhin, Gabriel Moreira, Ronay Ak, Radek Osmulski, Benedikt Schifferer and Even Oldridge
  • TG-S&P: Time-Series Data Generation for Improved Fashion Demand Forecasting
    Minsu Na, Sanguk Park, Joopil Lee, Seohyun Lee and Wooju Kim
  • Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion
    Jaewon Ahn, Woosung Jang and Beakcheol Jang
  • SARCH: Multimodal Search for Archaeological Archives [Paper Link]
    Nivedita Sinha, Bharati Khanijo, Sanskar Singh, Priyansh Mahant, Ashutosh Roy, Saubhagya Singh Bhadouria, Arpan Jain and Maya Ramanath

Call for Papers

The main objective of this workshop is to encourage pioneering research at the intersection of generative models with multimodal search and recommendation. The overarching theme is the leveraging of generative AI to enhance and revolutionize information access and personalized content delivery in these multimodal scenarios. This workshop aims to attract a diverse audience, including academic researchers and industry experts working on or interested in generative models, multimodal information retrieval, and recommender systems. It offers a unique forum for these stakeholders to share innovative ideas, methods, and accomplishments, encouraging interdisciplinary collaboration and the exploration of novel applications. Specifically, we invite contributions addressing three key areas: (1) Generative Retrieval and Recommendation utilizing Large Multimodal Models (LMMs) and Multimodal Large Language Models (MLLMs), (2) Advanced Content Generation methodologies within Generative Search and Recommendation systems, and (3) Domain-specific Applications, Benchmarks, and Deployment strategies.

Topics of interest include, but are not limited to:

  • LMM/MLLM for generative retrieval & recommendation
    • Vision-language models for personalized recommendation.
    • Multimodal representation learning for generative search.
    • Cross-modal and modality-enhanced zero-shot recommendation systems.
    • Reasoning and explainability in multimodal generative search systems.
    • Multimodal dense/sparse retrieval methods.
    • Large foundation models and large-scale benchmark datasets for multimodal search/recommendation.
    • Multimodal memory-augmented models and long-term personalization.
    • Trustworthiness and efficiency in multimodal search and recommendation.
    • Privacy-preserving personalization in multimodal generative recommendation.
  • Content generation in generative search & recommendation
    • Generative models (e.g., large language models, diffusion models) for recommendation tasks.
    • Image, video, and text generation for product or content search.
    • Human-AI co-creation in multimodal content search.
    • Privacy-preserving multimodal content generation.
    • Evaluation protocols for hallucination and factuality in multimodal content generation.
  • Vertical applications, benchmarks & deployment
    • Fashion, food, and lifestyle recommendation with multimodal inputs.
    • Generative recommendation in e-commerce and retail scenarios.
    • Domain-specific benchmarks for multimodal search and recommendation.
    • Generating comprehensive reports for specific tasks using multimodal retrieval and generation.
    • Scalable system architectures and deployment frameworks for real-world generative multimodal search and recommendation.

Submission Guidelines: Authors are invited to submit original, full-length research papers that are not previously published, accepted to be published, or being considered for publication in any other forum. Manuscripts should be submitted to the CIKM 2025 EasyChair site in PDF format, using the 2-column sigconf format from the ACM proceedings template. Submissions can be of varying length from 4 to 9 pages, plus unlimited pages for references. The authors may decide on the appropriate length of the paper as no distinction is made between long and short papers. The review process will be double-blind, and submissions that are not properly anonymized will be desk-rejected without review. Each submission will be allocated to a minimum of three program committee members without COI for review. After collecting reviewers’ comments, a meeting will be scheduled by the organizers to discuss reviews and make the final decision. At least one author of each accepted paper must register and present the work on-site in Seoul, Korea, as scheduled in the official CIKM 2025 conference program.

In addition, we also welcome and invite submissions of outstanding papers that have been recently accepted or published in top-tier conferences or journals, to foster broader discussion and engagement on the latest research advances. (Please contact yi.bin@hotmail.com).

Submission site: https://easychair.org/conferences?conf=mmgensr2025

Important Dates

  • Paper Submission Deadline: August 31, 2025 September 15, 2025
  • Paper Acceptance Notification: September 30, 2025
  • Workshop Date: November 14, 2025

Workshop Organizers

 

Yi Bin

Tongji University

 

Haoxuan Li

University of Electronic Science and Technology of China

 

Haokai Ma

National University of Singapore

 

Yang Zhang

National University of Singapore

 

Lin Wang

The Hong Kong Polytechnic University

 

Wenjie Wang

University of Science and Technology of China

 

Yunshan Ma

Singapore Management University

 

Yang Yang

University of Electronic Science and Technology of China

 

Tat-Seng Chua

National University of Singapore

Contact