CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
TL;DR Highlight
A multi-agent framework where two AI agents role-play and converse to autonomously complete complex tasks without human intervention.
Who Should Read
Backend/AI developers designing LLM-based autonomous agent systems or building multi-agent pipelines. Also useful for ML engineers wanting to auto-generate instruction-following datasets for fine-tuning.
Core Mechanics
- Proposes a Role-Playing framework where an AI Assistant (executor) and AI User (director) are given roles, and with just an idea from a human, agents complete the task through conversation alone
- Inception Prompting: design only the system prompt before the conversation starts, then agents automatically prompt each other — solves role flipping, infinite loops, and empty reply (flake reply) issues through prompt engineering
- Framework generates large-scale instruction-following datasets automatically: AI Society (25,000 conversations), Code, Math (50K), Science (60K) — published on HuggingFace
- CAMEL multi-agent solution wins 76.3% of human evaluations against single gpt-3.5-turbo calls on AI Society tasks
- Sequentially fine-tuning LLaMA-7B on generated datasets shows gradual knowledge emergence per domain
- Critic-in-the-Loop: optional extension to add an AI or human critic agent to the loop for tree-search style decision-making
Evidence
- CAMEL agent solution wins 76.3% human evaluations (453 evaluators), 73.0% GPT-4 evaluations vs gpt-3.5-turbo single call on AI Society tasks
- Code task GPT-4 evaluation: CAMEL 76.0% win rate vs gpt-3.5-turbo 24.0%
- CAMEL-7B (LLaMA-7B fine-tuned): HumanEval pass@1 14.0%, pass@100 57.9% vs LLaMA-7B (10.5%, 36.5%) and Vicuna-7B (11.0%, 42.9%) — significant improvement
- Cumulative fine-tuning LLaMA-7B on AI Society → Code → Math → Science: final model wins all 20/20 tasks across domains vs individual models
How to Apply
- Define two roles in your project (e.g., 'domain expert' + 'developer'), copy the Inception Prompt template from Figure 2 and apply it as a system prompt for immediate autonomous collaborative agent implementation
- When instruction-following fine-tuning data is scarce, set up role combinations for the desired domain with the CAMEL framework and auto-generate conversations as training data — the paper auto-generated 25,000 conversations using two gpt-3.5-turbo instances
- If agent loops produce infinite conversations or role flipping: explicitly add prompt guardrails to the system prompt like 'Never flip roles!', a '<CAMEL_TASK_DONE>' termination token, and a max message limit (40 messages)
Code Example
Terminology
Related Resources
Original Abstract (Expand)
The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their"cognitive"processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond: https://github.com/camel-ai/camel.