The Rise and Potential of Large Language Model Based Agents: A Survey

Sep 14, 2023•Zhiheng Xi, Wenxiang Chen, Xin Guo +27•View PDF

TL;DR Highlight

An 86-page survey comprehensively covering AI Agent architecture using LLMs as the Brain, including applications and social simulation.

Who Should Read

Backend/AI developers designing or evaluating adoption of LLM-based agent systems. Suitable for those wanting the big picture from single agents to multi-agent collaborative architectures.

Core Mechanics

Defines LLM Agent structure clearly as 3 modules: Brain (reasoning/memory/knowledge), Perception (multimodal input), Action (text output/tool use/physical action)
Introduces 3 memory management strategies to overcome Transformer length limits: summary compression, and vector/DB storage — directly applicable to real long-term conversation agent implementation
Categorizes Chain-of-Thought (CoT), ReAct, Reflexion and other reasoning/planning techniques into Plan Formulation vs Plan Reflection two stages
Divides multi-agent patterns into cooperative (ChatDev, MetaGPT) and competitive/debate (ChatEval, Du et al.) axes, explaining which problems each is effective for
Classifies real deployments like AutoGPT, Voyager (Minecraft survival agent), ChemCrow (chemistry research agent) into task-oriented/innovation-oriented/lifecycle-oriented 3 types
In agent social simulation, emergent social phenomena (cooperation, information diffusion, norm formation) appear — also warns about misuse, job displacement, and safety risks

Evidence

Voyager (GPT-4 based) continuously explores and learns in Minecraft without human intervention, significantly ahead of RL-based agents in both items acquired and distance explored
GPT-4 demonstrates zero-shot performance across diverse domains including abstract reasoning, coding, math, medicine, and law — researchers evaluate as 'spark' of AGI
PaLM-E co-trains robot data and general vision-language data, demonstrating zero-shot/one-shot generalization to novel object combinations
Multi-agent software development systems like ChatDev, MetaGPT report higher code quality and completeness through role division (PM, developer, tester) vs single agents

How to Apply

When agents need long-term memory: directly apply Generative Agents' approach of prioritizing memories by weighted sum of Recency, Relevance, and Importance scores
Instead of relying on a single LLM for complex tasks, split into role-based agents (planner, executor, reviewer) using ChatDev/MetaGPT patterns to improve quality and error correction efficiency
When building tool-using agents, structuring the 'Thought → Action → Observation' loop as a prompt template (like Toolformer or ReAct) makes debugging much easier

Code Example

snippet

# ReAct-style agent prompt template example
SYSTEM_PROMPT = """
You are an agent that solves tasks step by step.
For each step, output in this format:
Thought: [reasoning about what to do next]
Action: [tool_name(args)] or [Final Answer: answer]
Observation: [result of the action]

Available tools:
- search(query): Search the web
- calculator(expr): Evaluate math expression
- read_file(path): Read a file
"""

def run_agent(task: str, llm, max_steps: int = 10):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": f"Task: {task}"}
    ]
    
    for step in range(max_steps):
        response = llm(messages)
        messages.append({"role": "assistant", "content": response})
        
        # Detect Final Answer
        if "Final Answer:" in response:
            return response.split("Final Answer:")[-1].strip()
        
        # Parse Action and execute tool
        if "Action:" in response:
            action_line = [l for l in response.split('\n') if l.startswith('Action:')][0]
            observation = execute_tool(action_line)  # Execute actual tool
            messages.append({"role": "user", "content": f"Observation: {observation}"})
    
    return "Max steps reached"

Terminology

LLM-based AgentAn AI system that uses an LLM as its brain to autonomously plan and act. Given only a goal, it automatically breaks it into steps and executes them.

Chain-of-Thought (CoT)A prompting technique where LLMs output intermediate reasoning steps before the answer. Same as writing out math problem solutions.

ReActAn agent pattern alternating between Reasoning and Acting. Think → use tool → observe result → think again loop.

In-context Learning (ICL)Making a model perform new tasks by putting just a few examples in the prompt, without changing model parameters.

Emergent BehaviorComplex behaviors arising from interactions between multiple simple agents that weren't explicitly programmed — like flocking behavior in birds.

Related Resources

LLM-Agent-Paper-List (GitHub)

Original Abstract (Expand)

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.