Large Language Model based Multi-Agents: A Survey of Progress and Challenges
TL;DR Highlight
A survey covering Multi-Agent system structure, communication methods, and application cases — LLM collaboration at a glance.
Who Should Read
Backend/AI developers adopting or designing Multi-Agent frameworks like AutoGen, MetaGPT, or CrewAI. Especially useful for teams expanding from single-agent to multi-agent LLM architectures.
Core Mechanics
- Multi-Agent systems can be classified on 4 axes: agent-environment interface (Sandbox/Physical/None), agent profiling, communication structure, and capability acquisition method
- Communication structures are 4 types: Layered, Decentralized, Centralized, Shared Message Pool — MetaGPT uses Shared Message Pool
- Agent role (Profile) assignment: 3 types — Pre-defined (designer defines directly), Model-Generated (LLM auto-generates), Data-Derived (dataset-based)
- 3 agent capability improvement strategies: Memory (store/retrieve past records), Self-Evolution (self-modify goals/strategies), Dynamic Generation (create new agents during execution)
- Applications divided into Problem Solving (software development, robotics, science, debate) and World Simulation (social, economic, gaming, policy, disease spread)
- Key challenges: propagation of Hallucination between agents, optimizing Collective Intelligence, computing costs and orchestration complexity when running many GPT-4-class LLMs
Evidence
- S3 system simulates social networks of 8,563 and 17,945 agents, reproducing gender discrimination and nuclear energy opinion propagation
- Agent4Rec uses 1,000 generated agents based on MovieLens-1M data to reproduce real user recommendation behavior and filter bubble effects
- MetaGPT evaluates multi-agent collaboration-based code generation on HumanEval and MBPP benchmarks, structurally reducing hallucination with SOP encoding
- Multi-agent Debate (Du et al., 2023) experimentally demonstrates improved factuality vs single-agent on 6 reasoning/factual tasks including GSM8K and StrategyQA
How to Apply
- When implementing software development automation, using MetaGPT or AutoGen to configure PM→Developer→Tester roles in a Layered communication structure enables better complex task decomposition and verification than single-agent
- To improve LLM answer accuracy, apply the Debate paradigm where multiple agents challenge each other and reach consensus — effective for precision-critical tasks like medical diagnosis, code review, math problem solving
- When user behavior simulation or A/B test replacement is needed: generate agent profiles using Data-Derived approach from real data and simulate interactions in a Sandbox environment for pre-validation before actual deployment
Code Example
# Simple Multi-Agent Debate setup example using AutoGen
import autogen
config_list = [{"model": "gpt-4", "api_key": "YOUR_API_KEY"}]
llm_config = {"config_list": config_list}
# Define agent roles (Pre-defined Profiling)
proponent = autogen.AssistantAgent(
name="Proponent",
system_message="You are a debater who supports the given argument. Defend your position with evidence.",
llm_config=llm_config,
)
opponent = autogen.AssistantAgent(
name="Opponent",
system_message="You are a debater who opposes the given argument. Refute it logically.",
llm_config=llm_config,
)
judge = autogen.AssistantAgent(
name="Judge",
system_message="You are an impartial judge. Listen to both sides and derive a final consensus.",
llm_config=llm_config,
)
user_proxy = autogen.UserProxyAgent(
name="UserProxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=6,
)
# Start Debate (Centralized structure: judge moderates)
groupchat = autogen.GroupChat(
agents=[user_proxy, proponent, opponent, judge],
messages=[],
max_round=6,
speaker_selection_method="round_robin",
)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)
user_proxy.initiate_chat(
manager,
message="Debate topic: Can LLMs replace human doctors?"
)Terminology
Related Resources
Original Abstract (Expand)
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to their notable capabilities in planning and reasoning, LLMs have been utilized as autonomous agents for the automatic execution of various tasks. Recently, LLM-based agent systems have rapidly evolved from single-agent planning or decision-making to operating as multi-agent systems, enhancing their ability in complex problem-solving and world simulation. To offer an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects and challenges of LLM-based multi-agent (LLM-MA) systems. Our objective is to provide readers with an in-depth understanding of these key points: the domains and settings where LLM-MA systems operate or simulate; the profiling and communication methods of these agents; and the means by which these agents develop their skills. For those interested in delving into this field, we also summarize the commonly used datasets or benchmarks. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository (github.com/taichengguo/LLM_MultiAgents_Survey_Papers), dedicated to outlining the research of LLM-MA research.