Ant Colony Optimization을 활용한 효율적이고 해석 가능한 Multi-Agent LLM Routing

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Mar 13, 2026•Xudong Wang, Chaoning Zhang, Jiaquan Zhang +8•View PDF

TL;DR Highlight

개미 군집 최적화 알고리즘으로 여러 LLM 에이전트에게 쿼리를 스마트하게 분배해 비용은 줄이고 속도는 4.7배 높인 라우팅 프레임워크

Who Should Read

여러 LLM을 조합해 복잡한 태스크를 처리하는 Multi-Agent 시스템을 설계하거나, 고비용 LLM 호출을 줄이면서 성능을 유지하고 싶은 AI 엔지니어 및 백엔드 개발자

Core Mechanics

AMRO-S는 Multi-Agent 시스템의 라우팅을 '계층적 방향 그래프에서의 경로 탐색 문제'로 모델링 - 각 노드가 (LLM 모델 × 추론 전략 × 역할 프롬프트) 조합
Llama-3.2-1B 같은 초경량 SLM(소형 언어 모델)을 SFT(지도 파인튜닝)해서 라우터로 사용 - GPT-4o-mini 없이도 97.93% 의도 분류 정확도 달성
태스크별 독립적인 pheromone(페로몬) 행렬을 유지해서 수학/코드/일반 쿼리가 서로 간섭하지 않도록 격리
LLM-Judge가 고품질 경로만 걸러내는 quality gate를 통과한 결과로만 비동기 백그라운드 업데이트 - 서빙 레이턴시 증가 없음
1000개 동시 프로세스에서도 정확도 96.4% 유지 (기존 WRR 방식은 88.2%로 추락)
pheromone 히트맵으로 '왜 이 에이전트로 라우팅했는지' 시각적으로 추적 가능 - 블랙박스 라우팅 탈피

Evidence

5개 벤치마크(MMLU, GSM8K, MATH, HumanEval, MBPP) 평균 점수 87.83 - 최강 베이스라인 MasRouter(85.93) 대비 1.90p 향상
동시 처리 1000 프로세스 기준 4.7× 속도 향상 (3849초 → 823초), 정확도는 96.1%~96.4% 안정적 유지
SFT 적용 후 Llama-3.2-1B의 의도 인식 정확도 82.00% → 97.93%로 대폭 향상
MacNet + AMRO-S 조합 시 GSM8K 비용 $2.14 → $2.00으로 절감하면서 정확도는 94.69% → 95.00%로 동시 개선

How to Apply

기존 MAS 프레임워크(GPTSwarm, MacNet 등)에 경로 선택 정책만 AMRO-S로 교체하면 plug-and-play 적용 가능 - 에이전트 구성이나 실행 워크플로우 변경 불필요
태스크 유형(수학/코드/일반 등)별로 3000개 정도의 라우팅 훈련 데이터를 만들고 1B~1.5B 소형 모델을 30분 SFT하면 저비용 고정밀 라우터 확보 가능
고비용 LLM(GPT-4o 등) 대신 gpt-4o-mini, gemini-1.5-flash, claude-3.5-haiku, llama-3.1-70b 이종 풀을 구성하고, 쿼리 난이도/유형에 따라 자동 분배하면 비용 대비 성능 극대화

Code Example

snippet

# AMRO-S 핵심 로직 의사코드

# 1. SFT된 소형 라우터로 쿼리 의도 분류
router = SFTSmallLanguageModel('Llama-3.2-1B-Instruct')
w = router.get_task_weights(query)  # {'math': 0.8, 'code': 0.1, 'general': 0.1}

# 2. 태스크별 pheromone 행렬 융합
# tau_math, tau_code, tau_general: 각 태스크의 경로 선호도 행렬
tau_fused = sum(w[t] * tau[t] for t in ['math', 'code', 'general'])

# 3. ACO 전이 확률 계산 (exploitation vs exploration 균형)
def get_transition_prob(tau_fused, eta, alpha=1.0, beta=2.0, gamma=0.1):
    scores = {}
    for j in allowed_nodes:
        scores[j] = (tau_fused[i][j] ** alpha) * (eta[j] ** beta)
    total = sum(scores.values())
    probs = {j: s / total for j, s in scores.items()}
    # 최소 탐색 보장 (epsilon-greedy 방식)
    final_probs = {
        j: gamma * (1/len(allowed_nodes)) + (1-gamma) * probs[j]
        for j in allowed_nodes
    }
    return final_probs

# 4. 경로 샘플링 후 에이전트 실행
path = sample_path(get_transition_prob(tau_fused, eta))
output = execute_agents(path, query)

# 5. 비동기 품질 게이팅 업데이트 (서빙 레이턴시 영향 없음)
if llm_judge(query, path, output) == 1:  # 고품질만 통과
    for t in task_types:
        for (i,j) in path:
            tau[t][i][j] = (1-rho) * tau[t][i][j] + w[t] * Q / (cost(path) + eps)

Terminology

ACO (Ant Colony Optimization)개미가 먹이를 찾을 때 페로몬 흔적을 남기고 다른 개미들이 그 길을 따라가는 자연 현상에서 영감받은 최적화 알고리즘. 좋은 경로일수록 페로몬이 쌓여 더 많이 선택되는 양의 피드백 루프.

pheromoneACO에서 '이 경로가 얼마나 좋았는지'를 기록하는 수치. 여기선 각 에이전트 전환 경로의 과거 성능을 태스크별로 저장한 행렬.

SFT (Supervised Fine-Tuning)정답 레이블이 있는 데이터로 모델을 추가 학습시키는 방법. 학교에서 예제 풀이를 보고 따라 푸는 것처럼, 모범 답안을 보여주며 특정 태스크에 특화시킴.

MAS (Multi-Agent System)여러 AI 에이전트가 협력해서 복잡한 태스크를 해결하는 시스템. 팀 프로젝트처럼 각자 전문 역할을 맡아 분업하는 구조.

SLM (Small Language Model)수십억 파라미터 대신 수억 파라미터로 구성된 경량 언어 모델. GPT-4 같은 대형 모델보다 훨씬 빠르고 싸지만 특화된 태스크엔 충분한 성능 발휘.

quality gate나쁜 품질의 데이터가 학습에 섞이지 않도록 필터링하는 장치. 여기선 LLM이 심사관 역할을 해서 좋은 결과만 페로몬 업데이트에 반영.

pass@1코드 생성 평가에서 단 한 번의 시도로 정답을 맞출 확률. 높을수록 모델이 한 방에 올바른 코드를 뽑아낸다는 의미.

Original Abstract (Expand)

Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost trade-off space. Despite these advances, real-world deployment is often constrained by high inference cost, latency, and limited transparency, which hinders scalable and efficient routing. Existing routing strategies typically rely on expensive LLM-based selectors or static policies, and offer limited controllability for semantic-aware routing under dynamic loads and mixed intents, often resulting in unstable performance and inefficient resource utilization. To address these limitations, we propose AMRO-S, an efficient and interpretable routing framework for Multi-Agent Systems (MAS). AMRO-S models MAS routing as a semantic-conditioned path selection problem, enhancing routing performance through three key mechanisms: First, it leverages a supervised fine-tuned (SFT) small language model for intent inference, providing a low-overhead semantic interface for each query; second, it decomposes routing memory into task-specific pheromone specialists, reducing cross-task interference and optimizing path selection under mixed workloads; finally, it employs a quality-gated asynchronous update mechanism to decouple inference from learning, optimizing routing without increasing latency. Extensive experiments on five public benchmarks and high-concurrency stress tests demonstrate that AMRO-S consistently improves the quality--cost trade-off over strong routing baselines, while providing traceable routing evidence through structured pheromone patterns.