The Rise and Potential of Large Language Model Based Agents: A Survey
TL;DR Highlight
An 86-page survey comprehensively covering AI Agent architecture using LLMs as the Brain, including applications and social simulation.
Who Should Read
Backend/AI developers designing or evaluating adoption of LLM-based agent systems. Suitable for those wanting the big picture from single agents to multi-agent collaborative architectures.
Core Mechanics
- Defines LLM Agent structure clearly as 3 modules: Brain (reasoning/memory/knowledge), Perception (multimodal input), Action (text output/tool use/physical action)
- Introduces 3 memory management strategies to overcome Transformer length limits: summary compression, and vector/DB storage — directly applicable to real long-term conversation agent implementation
- Categorizes Chain-of-Thought (CoT), ReAct, Reflexion and other reasoning/planning techniques into Plan Formulation vs Plan Reflection two stages
- Divides multi-agent patterns into cooperative (ChatDev, MetaGPT) and competitive/debate (ChatEval, Du et al.) axes, explaining which problems each is effective for
- Classifies real deployments like AutoGPT, Voyager (Minecraft survival agent), ChemCrow (chemistry research agent) into task-oriented/innovation-oriented/lifecycle-oriented 3 types
- In agent social simulation, emergent social phenomena (cooperation, information diffusion, norm formation) appear — also warns about misuse, job displacement, and safety risks
Evidence
- Voyager (GPT-4 based) continuously explores and learns in Minecraft without human intervention, significantly ahead of RL-based agents in both items acquired and distance explored
- GPT-4 demonstrates zero-shot performance across diverse domains including abstract reasoning, coding, math, medicine, and law — researchers evaluate as 'spark' of AGI
- PaLM-E co-trains robot data and general vision-language data, demonstrating zero-shot/one-shot generalization to novel object combinations
- Multi-agent software development systems like ChatDev, MetaGPT report higher code quality and completeness through role division (PM, developer, tester) vs single agents
How to Apply
- When agents need long-term memory: directly apply Generative Agents' approach of prioritizing memories by weighted sum of Recency, Relevance, and Importance scores
- Instead of relying on a single LLM for complex tasks, split into role-based agents (planner, executor, reviewer) using ChatDev/MetaGPT patterns to improve quality and error correction efficiency
- When building tool-using agents, structuring the 'Thought → Action → Observation' loop as a prompt template (like Toolformer or ReAct) makes debugging much easier
Code Example
# ReAct-style agent prompt template example
SYSTEM_PROMPT = """
You are an agent that solves tasks step by step.
For each step, output in this format:
Thought: [reasoning about what to do next]
Action: [tool_name(args)] or [Final Answer: answer]
Observation: [result of the action]
Available tools:
- search(query): Search the web
- calculator(expr): Evaluate math expression
- read_file(path): Read a file
"""
def run_agent(task: str, llm, max_steps: int = 10):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Task: {task}"}
]
for step in range(max_steps):
response = llm(messages)
messages.append({"role": "assistant", "content": response})
# Detect Final Answer
if "Final Answer:" in response:
return response.split("Final Answer:")[-1].strip()
# Parse Action and execute tool
if "Action:" in response:
action_line = [l for l in response.split('\n') if l.startswith('Action:')][0]
observation = execute_tool(action_line) # Execute actual tool
messages.append({"role": "user", "content": f"Observation: {observation}"})
return "Max steps reached"Terminology
Related Resources
Original Abstract (Expand)
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.