A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval

Mar 7, 2025•Yu Zhang, Shutong Qiao, Jiaqi Zhang +3•View PDF

TL;DR Highlight

The first comprehensive survey systematically covering how LLM Agents are transforming recommendation systems and search engines

Who Should Read

ML engineers and backend developers looking to integrate LLMs into recommendation systems or search engines. Anyone who wants to design or study AI agent-based IR system architectures.

Core Mechanics

LLM Agent roles in recommendation/search are broadly categorized into 4 types: User Interaction (conversational interface), Representation Optimization (improving user/item representations), System Integration (acting as the brain of the recommendation system), and Environment Simulation (user simulator)
In search systems, 5 roles are identified: Task Decomposer (breaking down complex search tasks), Query Rewriter (improving queries), Action Executor (tool calling), Results Synthesizer (summarizing results), and User Simulator (simulating user behavior)
LLM Agents can function as user simulators to test recommendation/search algorithms without real users — enabling evaluation without A/B testing costs or actual UX degradation
Chain-of-Thought (CoT) reasoning and long context windows are leveraged to decompose complex multi-step search tasks like 'travel planning' into subtasks for processing
Embodied Agents (agents that interact in real time with physical or cyber environments) are emerging as the next-generation paradigm for recommendation/search — enabling GUI manipulation, app navigation, and on-device personalization
Key unresolved challenges including hallucination, bias, deployment costs, personalization, and multimodal processing are outlined, along with future research directions

Evidence

In CompWoB benchmark experiments, GPT-3.5-turbo and GPT-4 achieved an average success rate of 94.0% on simple web tasks, but dropped sharply to 24.9% on composite tasks
Agent4Rec generates 1,000 LLM agents initialized with the MovieLens-1M dataset to evaluate recommendation algorithms, using agent feedback as iterative training data
USimAgent (a search user behavior simulator) outperforms existing methods on query generation and achieves comparable performance on click/stop behavior prediction
The iEvaLM framework achieves performance improvements over existing evaluation methodologies on 2 public CRS (Conversational Recommendation System) datasets and adds an interpretability evaluation dimension for recommendations

How to Apply

When adding an LLM Agent as a 'System Integration' layer to an existing recommendation system, follow patterns like RecMind or InteRecAgent — keep the existing ID-based recommendation model as a tool and let the LLM handle only natural language understanding and planning, enabling adoption without a full system redesign
To quickly evaluate recommendation/search algorithms without A/B testing, build an LLM-based user simulator with user profile, memory, and behavior modules (like Agent4Rec) and plug it into your algorithm validation pipeline
When handling complex search queries (e.g., '3-night 4-day Europe travel planning'), apply the Task Decomposer pattern to design an agent that automatically decomposes the task into subtasks — destination selection → itinerary → flights/hotels → budget calculation — and calls the appropriate external APIs for each

Code Example

snippet

# InteRecAgent style: example pattern using LLM as the interface for a recommendation system

from openai import OpenAI

client = OpenAI()

# Register the traditional recommendation model as a tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_recommendations",
            "description": "Extract candidate items using an ID-based collaborative filtering model",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {"type": "string"},
                    "top_k": {"type": "integer", "default": 10}
                },
                "required": ["user_id"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "rerank_by_context",
            "description": "Re-rank candidate items reflecting the current conversation context",
            "parameters": {
                "type": "object",
                "properties": {
                    "items": {"type": "array"},
                    "user_context": {"type": "string"}
                },
                "required": ["items", "user_context"]
            }
        }
    }
]

def recommendation_agent(user_id: str, user_message: str, history: list):
    """LLM Agent calls recommendation tools to generate personalized recommendations"""
    system_prompt = """You are a personalized recommendation assistant.
    Analyze the user's request and call the appropriate recommendation tools to provide optimal recommendations.
    If needed, call multiple tools sequentially to refine the results."""
    
    messages = [{"role": "system", "content": system_prompt}]
    messages.extend(history)
    messages.append({"role": "user", "content": user_message})
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    return response

# User simulator pattern (Agent4Rec style)
user_simulator_prompt = """
You are simulating a user with the following profile:
- Age: 28
- Preferred genres: Sci-Fi, Thriller
- Recently watched: [Inception, Interstellar, Tenet]
- Tendencies: Prefers complex storylines, dislikes Romance

The recommendation system has recommended the following movies: {recommended_items}

Respond as a real user would:
1. Which items you would click and why
2. Star rating (1-5)
3. Additional feedback
"""

Terminology

LLM AgentAn autonomous AI system that uses an LLM (Large Language Model) as its brain, augmented with memory, tool calling, and environment awareness. Unlike a simple chatbot, it can plan and act on its own.

RAGShort for Retrieval-Augmented Generation. A technique that has the LLM first retrieve relevant documents from an external database before generating an answer. Allows the model to respond with up-to-date information it wasn't trained on.

Chain-of-ThoughtA prompting technique that has the LLM lay out its reasoning step by step — 'step 1 → step 2 → step 3' — rather than jumping straight to an answer. Significantly improves performance on complex problem solving.

Embodied AgentAn agent that directly interacts with its environment through GUI clicks, app manipulation, web browsing, and more — not just processing text. Think of it as an AI that physically controls a robotic arm or a smartphone.

RLHFReinforcement Learning from Human Feedback. A technique that uses human feedback (good/bad) as a reward signal to further train an LLM to align with human preferences. ChatGPT was trained using this approach.

Cold StartThe problem of being unable to make recommendations when a new user or new item appears due to a lack of historical data. The situation of not knowing what to show a newly launched product or a newly registered user.

Multi-Agent SystemA structure where multiple LLM agents each take on a role and collaborate. Analogous to a manager agent, a search agent, and a summarization agent working together as a team.

HallucinationThe phenomenon where an LLM confidently generates content that is not factually accurate. In recommendation/search contexts, this can lead to the provision of incorrect information.

Related Resources

LLM-Agent-for-Recommendation-and-Search (GitHub)

Original Abstract (Expand)

Information technology has profoundly altered the way humans interact with information. The vast amount of content created, shared, and disseminated online has made it increasingly difficult to access relevant information. Over the past two decades, recommender systems and search (collectively referred to as information retrieval systems) have evolved significantly to address these challenges. Recent advances in large language models (LLMs) have demonstrated capabilities that surpass human performance in various language-related tasks and exhibit general understanding, reasoning, and decision-making abilities. This paper explores the transformative potential of LLM agents in enhancing recommender and search systems. We discuss the motivations and roles of LLM agents, and establish a classification framework to elaborate on the existing research. We highlight the immense potential of LLM agents in addressing current challenges in recommendation and search, providing insights into future research directions. This paper is the first to systematically review and classify the research on LLM agents in these domains, offering a novel perspective on leveraging this advanced AI technology for information retrieval. To help understand the existing works, we list the existing papers on LLM agent based recommendation and search at this link: https://github.com/tsinghua-fib-lab/LLM-Agent-for-Recommendation-and-Search.