Scaling Large-Language-Model-based Multi-Agent Collaboration

Jun 11, 2024•Cheng Qian, Zihao Xie, Yifei Wang +7•View PDF

TL;DR Highlight

Experiments scaling agents up to thousands revealed a 'collaborative scaling law' that governs multi-agent performance.

Who Should Read

AI engineers designing multi-agent systems or thinking about agent count and topology. Developers who want to boost LLM inference performance without retraining.

Core Mechanics

Introduced MACNET, a framework connecting agents in a DAG structure — actors on nodes, critics on edges for role separation
Discovered a 'collaborative scaling law': performance increases in a logistic (S-curve) as agent count grows — around 16 nodes (2^4) is the sweet spot
Irregular random topologies outperform structured ones like full mesh — thanks to 'small-world' properties found in social networks
Memory control solves context explosion: pass only the final artifact (not full history) between agents, reducing context length from O(n^2) to O(n)
Successfully ran 1,000+ agent collaboration experiments with GPT-3.5; outperformed COT, AutoGPT, AgentVerse on MMLU, HumanEval, software dev, and creative writing benchmarks
Emergence occurs at much smaller scale (hundreds of agents) than neural scaling law predicts — because agents already carry pretrained knowledge

Evidence

MACNET-RANDOM achieved top average quality of 0.6522; MACNET-CHAIN hit MMLU 0.6632 vs COT 0.3544, AutoGPT 0.4485, AgentVerse 0.2977
Memory control reduced token complexity from O(n^2) to O(n); random topology cut time by 51.92% vs mesh
Scaling from 2^0=1 to 2^6=64 nodes increased artifact token length by 7.51x (2^0 to 2^4 range)
93.10% of critic suggestions were adopted by actors; number of aspects considered grew from a dozen to dozens as network scaled

How to Apply

For code review or software dev pipelines, use chain or layer topology; for creative writing or brainstorming, switch to star/tree topology.
If context is exploding as agent count grows, apply the memory control pattern: pass only the final artifact (not full conversation) between agents.
For a good performance/cost balance, pick random topology over full mesh — 51.92% time savings with even better performance.

Code Example

snippet

# MACNET core pattern: critic-actor dual agent interaction
# Node = actor (generates artifact), Edge = critic (provides instructions and feedback)

from openai import OpenAI
client = OpenAI()

def critic_actor_interaction(task: str, artifact: str, critic_role: str, actor_role: str, rounds: int = 3) -> str:
    """
    MACNET core: iterative pattern where critic gives instructions and actor refines the artifact
    Only the artifact is passed to the next node (no full conversation history)
    """
    messages_critic = [
        {"role": "system", "content": f"You are {critic_role}. Review the artifact and give specific improvement instructions."},
    ]
    messages_actor = [
        {"role": "system", "content": f"You are {actor_role}. Refine the artifact based on critic's instructions."},
    ]
    
    current_artifact = artifact
    
    for _ in range(rounds):
        # Critic reviews the artifact and provides instructions
        messages_critic.append({"role": "user", "content": f"Task: {task}\nCurrent artifact:\n{current_artifact}\n\nProvide specific improvement instructions:"})
        critic_response = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages_critic)
        instruction = critic_response.choices[0].message.content
        
        # Actor receives instructions and refines the artifact
        messages_actor.append({"role": "user", "content": f"Task: {task}\nCurrent artifact:\n{current_artifact}\n\nCritic's instruction: {instruction}\n\nProvide refined artifact:"})
        actor_response = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages_actor)
        current_artifact = actor_response.choices[0].message.content
        
        # Key: retain only the artifact in memory (prevents context explosion)
        messages_critic = messages_critic[:1]  # keep only system prompt
        messages_actor = messages_actor[:1]
    
    return current_artifact  # pass only the artifact to the next node

# Chain topology example (software development)
def macnet_chain(task: str, agents: list) -> str:
    artifact = task
    for i in range(len(agents) - 1):
        critic_role, actor_role = agents[i], agents[i+1]
        artifact = critic_actor_interaction(task, artifact, critic_role, actor_role)
        print(f"Step {i+1} done: {critic_role} -> {actor_role}")
    return artifact

# Usage example
agent_chain = ["Requirements Analyst", "System Architect", "Senior Developer", "Code Reviewer", "QA Engineer"]
result = macnet_chain("Build a REST API for user authentication", agent_chain)

Terminology

DAGDirected Acyclic Graph. A graph where info flows in one direction with no cycles — prevents information from looping back between agents.

collaborative scaling lawA law describing how performance changes as agent count grows. Follows an S-curve: slow start, rapid middle, then saturation.

neural scaling lawThe principle that increasing model parameters, data size, or training compute improves performance by a power law. The theoretical basis for large models like ChatGPT.

topologyThe network structure defining how agents are connected — chain (linear), tree (hierarchical), mesh (fully connected), etc.

artifactThe final output produced through agent collaboration. Can be code, documents, answers, etc.

context explosionWhen the conversation history grows exponentially as more agents communicate, eventually exceeding the LLM's context window limit.

small-world propertyIn social networks, a few random connections dramatically shrink average path length. One reason random topologies perform well.

emergenceUnexpected capabilities that appear when combining multiple simple components. In LLMs, new abilities that suddenly appear beyond a certain model size threshold.

Related Resources

https://github.com/OpenBMB/ChatDev/tree/macnet

Original Abstract (Expand)

Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning. Inspired by the neural scaling law--increasing neurons enhances performance, this study explores whether the continuous addition of collaborative agents can yield similar benefits. Technically, we utilize directed acyclic graphs to organize agents into a multi-agent collaboration network (MacNet), upon which their interactive reasoning is topologically orchestrated for autonomous task solving. Extensive evaluations reveal that it effectively supports collaboration among over a thousand agents, with irregular topologies outperforming regular ones. We also identify a collaborative scaling law--the overall performance follows a logistic growth pattern as agents scale, with collaborative emergence occurring earlier than traditional neural emergence. We speculate this may be because scaling agents catalyzes their multidimensional considerations during interactive reflection and refinement, thereby producing more comprehensive artifacts. The code is available at https://github.com/OpenBMB/ChatDev/tree/macnet.