Scaling Large-Language-Model-based Multi-Agent Collaboration
TL;DR Highlight
Experiments scaling agents up to thousands revealed a 'collaborative scaling law' that governs multi-agent performance.
Who Should Read
AI engineers designing multi-agent systems or thinking about agent count and topology. Developers who want to boost LLM inference performance without retraining.
Core Mechanics
- Introduced MACNET, a framework connecting agents in a DAG structure — actors on nodes, critics on edges for role separation
- Discovered a 'collaborative scaling law': performance increases in a logistic (S-curve) as agent count grows — around 16 nodes (2^4) is the sweet spot
- Irregular random topologies outperform structured ones like full mesh — thanks to 'small-world' properties found in social networks
- Memory control solves context explosion: pass only the final artifact (not full history) between agents, reducing context length from O(n^2) to O(n)
- Successfully ran 1,000+ agent collaboration experiments with GPT-3.5; outperformed COT, AutoGPT, AgentVerse on MMLU, HumanEval, software dev, and creative writing benchmarks
- Emergence occurs at much smaller scale (hundreds of agents) than neural scaling law predicts — because agents already carry pretrained knowledge
Evidence
- MACNET-RANDOM achieved top average quality of 0.6522; MACNET-CHAIN hit MMLU 0.6632 vs COT 0.3544, AutoGPT 0.4485, AgentVerse 0.2977
- Memory control reduced token complexity from O(n^2) to O(n); random topology cut time by 51.92% vs mesh
- Scaling from 2^0=1 to 2^6=64 nodes increased artifact token length by 7.51x (2^0 to 2^4 range)
- 93.10% of critic suggestions were adopted by actors; number of aspects considered grew from a dozen to dozens as network scaled
How to Apply
- For code review or software dev pipelines, use chain or layer topology; for creative writing or brainstorming, switch to star/tree topology.
- If context is exploding as agent count grows, apply the memory control pattern: pass only the final artifact (not full conversation) between agents.
- For a good performance/cost balance, pick random topology over full mesh — 51.92% time savings with even better performance.
Code Example
# MACNET core pattern: critic-actor dual agent interaction
# Node = actor (generates artifact), Edge = critic (provides instructions and feedback)
from openai import OpenAI
client = OpenAI()
def critic_actor_interaction(task: str, artifact: str, critic_role: str, actor_role: str, rounds: int = 3) -> str:
"""
MACNET core: iterative pattern where critic gives instructions and actor refines the artifact
Only the artifact is passed to the next node (no full conversation history)
"""
messages_critic = [
{"role": "system", "content": f"You are {critic_role}. Review the artifact and give specific improvement instructions."},
]
messages_actor = [
{"role": "system", "content": f"You are {actor_role}. Refine the artifact based on critic's instructions."},
]
current_artifact = artifact
for _ in range(rounds):
# Critic reviews the artifact and provides instructions
messages_critic.append({"role": "user", "content": f"Task: {task}\nCurrent artifact:\n{current_artifact}\n\nProvide specific improvement instructions:"})
critic_response = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages_critic)
instruction = critic_response.choices[0].message.content
# Actor receives instructions and refines the artifact
messages_actor.append({"role": "user", "content": f"Task: {task}\nCurrent artifact:\n{current_artifact}\n\nCritic's instruction: {instruction}\n\nProvide refined artifact:"})
actor_response = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages_actor)
current_artifact = actor_response.choices[0].message.content
# Key: retain only the artifact in memory (prevents context explosion)
messages_critic = messages_critic[:1] # keep only system prompt
messages_actor = messages_actor[:1]
return current_artifact # pass only the artifact to the next node
# Chain topology example (software development)
def macnet_chain(task: str, agents: list) -> str:
artifact = task
for i in range(len(agents) - 1):
critic_role, actor_role = agents[i], agents[i+1]
artifact = critic_actor_interaction(task, artifact, critic_role, actor_role)
print(f"Step {i+1} done: {critic_role} -> {actor_role}")
return artifact
# Usage example
agent_chain = ["Requirements Analyst", "System Architect", "Senior Developer", "Code Reviewer", "QA Engineer"]
result = macnet_chain("Build a REST API for user authentication", agent_chain)Terminology
Related Resources
Original Abstract (Expand)
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning. Inspired by the neural scaling law--increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. Technically, we utilize directed acyclic graphs to organize agents into a multi-agent collaboration network (MacNet), upon which their interactive reasoning is topologically orchestrated for autonomous task solving. Extensive evaluations reveal that it effectively supports collaboration among over a thousand agents, with irregular topologies outperforming regular ones. We also identify a collaborative scaling law--the overall performance follows a logistic growth pattern as agents scale, with collaborative emergence occurring earlier than traditional neural emergence. We speculate this may be because scaling agents catalyzes their multidimensional considerations during interactive reflection and refinement, thereby producing more comprehensive artifacts. The code is available at https://github.com/OpenBMB/ChatDev/tree/macnet.