International Workshop on Multimodal Generative Search and Recommendation

About the MMGenSR

The rapid advancements in generative Artificial Intelligence (AI) have ignited a revolutionary wave across information retrieval and recommender systems. The MMGenSR workshop serves as a premier interdisciplinary platform to explore how generative models, particularly Large Language Models (LLMs) and Large Multimodal Models (LMMs), are transforming both multimodal search and recommendation paradigms.

We aim to bring together researchers and practitioners to discuss innovative architectures, methodologies, and evaluation strategies for generative document retrieval, generative image retrieval, grounded answer generation, generative recommendation, and other multimodal tasks.

The workshop will foster discussions on improving algorithms, generating personalized content, evolving user-system interactions, enhancing trustworthiness, and refining evaluation methodologies for these cutting-edge systems. This timely workshop seeks to identify promising future research directions, address key challenges, and catalyze collaborations towards the development of next-generation intelligent systems.

Call for Papers

The main objective of this workshop is to encourage pioneering research at the intersection of generative models with multimodal search and recommendation. The overarching theme is the leveraging of generative AI to enhance and revolutionize information access and personalized content delivery in these multimodal scenarios. This workshop aims to attract a diverse audience, including academic researchers and industry experts working on or interested in generative models, multimodal information retrieval, and recommender systems. It offers a unique forum for these stakeholders to share innovative ideas, methods, and accomplishments, encouraging interdisciplinary collaboration and the exploration of novel applications. Specifically, we invite contributions addressing three key areas: (1) Generative Retrieval and Recommendation utilizing Large Multimodal Models (LMMs) and Multimodal Large Language Models (MLLMs), (2) Advanced Content Generation methodologies within Generative Search and Recommendation systems, and (3) Domain-specific Applications, Benchmarks, and Deployment strategies.

Topics of interest include, but are not limited to:

LMM/MLLM for generative retrieval & recommendation
- Vision-language models for personalized recommendation.
- Multimodal representation learning for generative search.
- Cross-modal and modality-enhanced zero-shot recommendation systems.
- Reasoning and explainability in multimodal generative search systems.
- Multimodal dense/sparse retrieval methods.
- Large foundation models and large-scale benchmark datasets for multimodal search/recommendation.
- Multimodal memory-augmented models and long-term personalization.
- Trustworthiness and efficiency in multimodal search and recommendation.
- Privacy-preserving personalization in multimodal generative recommendation.
Content generation in generative search & recommendation
- Generative models (e.g., large language models, diffusion models) for recommendation tasks.
- Image, video, and text generation for product or content search.
- Human-AI co-creation in multimodal content search.
- Privacy-preserving multimodal content generation.
- Evaluation protocols for hallucination and factuality in multimodal content generation.
Vertical applications, benchmarks & deployment
- Fashion, food, and lifestyle recommendation with multimodal inputs.
- Generative recommendation in e-commerce and retail scenarios.
- Domain-specific benchmarks for multimodal search and recommendation.
- Generating comprehensive reports for specific tasks using multimodal retrieval and generation.
- Scalable system architectures and deployment frameworks for real-world generative multimodal search and recommendation.

Submission Guidelines: Authors are invited to submit original, full-length research papers that are not previously published, accepted to be published, or being considered for publication in any other forum. Manuscripts should be submitted to the CIKM 2025 EasyChair site in PDF format, using the 2-column sigconf format from the ACM proceedings template. Submissions can be of varying length from 4 to 9 pages, plus unlimited pages for references. The authors may decide on the appropriate length of the paper as no distinction is made between long and short papers. The review process will be double-blind, and submissions that are not properly anonymized will be desk-rejected without review. Each submission will be allocated to a minimum of three program committee members without COI for review. After collecting reviewers’ comments, a meeting will be scheduled by the organizers to discuss reviews and make the final decision. At least one author of each accepted paper must register and present the work on-site in Seoul, Korea, as scheduled in the official CIKM 2025 conference program.

In addition, we also welcome and invite submissions of outstanding papers that have been recently accepted or published in top-tier conferences or journals, to foster broader discussion and engagement on the latest research advances. (Please contact yi.bin@hotmail.com).

Submission site: https://easychair.org/conferences?conf=mmgensr2025

Workshop Programme

Activity type	Time	Title
Welcome & Opening remarks	09:00am-09:10am
Keynote 1	09:10am-09:40am	[TBD]
Keynote 2	09:40am-10:10am	[TBD]
Coffee break	10:10am-10:30am
Oral presentations (4 papers, 15 minutes each)	10:30am-11:30am	1. Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation 2. Rank-Aware Indigo-DPO: Scalable Preference Optimization for Industrial Talent Search Ranking 3. Turning Adversaries into Allies: Reversing Typographic Attacks for Multimodal Product Retrieval 4. Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation
Lightning spotlight presentations (6 papers, 5 minutes each)	11:30am-12:00pm	1. V-Agent: An Interactive Video Search System Using Vision-Language Models 2. EcomCLIP: Leveraging multimodal models for generating semantic embeddings 3. XX-Qwen-OmniEmbed: A Unified Multimodal Retrieval Model for Text, Image, Audio, and Video 4. TG-S&P: Time-Series Data Generation for Improved Fashion Demand Forecasting 5. Enhancing Medical Cross-Modal Hashing Retrieval using Dropout-Voting Mixture-of-Experts Fusion 6. SARCH: Multimodal Search for Archaeological Archives

About the MMGenSR

Call for Papers

Workshop Programme

Invited Speakers

Important Dates

Workshop Organizers

Yi Bin

Haoxuan Li

Haokai Ma

Yang Zhang

Wenjie Wang

Yunshan Ma

Yang Yang

Tat-Seng Chua

Contact