LLMPC: LLM 기반 Model Predictive Control 플래닝 프레임워크

LLMPC: Large Language Model Predictive Control

Jan 5, 2025•Gabriel Maher•View PDF

TL;DR Highlight

LLM을 제어공학의 MPC(모델 예측 제어)로 해석해서 플래닝 성능을 대폭 끌어올리는 프레임워크

Who Should Read

LLM 에이전트의 플래닝 정확도를 높이려는 개발자, 특히 여행 일정, 회의 스케줄링 같은 제약 조건이 복잡한 태스크를 LLM으로 해결하려는 AI 엔지니어.

Core Mechanics

LLM은 플래닝 프롬프트를 받으면 암묵적으로 비용 함수(cost function)를 최소화하는 최적화 알고리즘처럼 동작한다는 걸 수식으로 증명
MPC(모델 예측 제어) 방식을 적용해서 LLM에게 여러 후보 플랜을 동시에 생성시키고, 목적 함수로 평가해서 최선을 고르는 방식 도입
단순히 한 번에 답 내놓는 few-shot 프롬프팅 대신, 이전 플랜의 실패 이유를 피드백으로 넣고 반복 개선(iterative refinement)하는 루프 구조
GPT-4o-mini로 여행 플래닝 실험 시, 단일 라운드 GPT-4o 대비 성공률 14.5% → 44.6%로 향상
회의 스케줄링에서는 T=9, K=3 설정(9번 반복 + 매번 3개 후보)으로 52.5% → 67% 성공률 달성
물리 시뮬레이션(스프링-질량계)에서 후보 샘플 수를 1개→15개로 늘리면 MPC 대비 비용 차이가 8.21배→1.30배로 줄어듦

Evidence

여행 플래닝 성공률: GPT-4o 단일 라운드 14.5% → LLMPC T=7 기준 44.6% (약 3배 향상)
회의 스케줄링 성공률: GPT-4o 단일 라운드 52.5% → LLMPC T=9, K=3 기준 67% (14.5%p 향상)
스프링-질량계 실험에서 샘플 수 K=1일 때 MPC 대비 비용 비율 8.21, K=15에서 1.30으로 감소 (84% 격차 축소)
도시 수가 많아질수록 LLMPC의 성능 우위가 더 커지며, 복잡한 문제일수록 반복 횟수(T)와 샘플 수(K) 모두 늘리는 게 효과적

How to Apply

제약 조건이 많은 플래닝 태스크(일정 조율, 경로 최적화 등)에서 LLM에게 한 번에 하나의 답 대신 K개 후보를 동시에 요청하고, 별도 평가 함수로 가장 좋은 것을 선택하는 방식으로 교체해보면 된다.
이전 플랜의 실패 원인(어떤 제약이 위반됐는지)을 자동으로 추출하는 평가 함수를 만들고, 이걸 다음 프롬프트의 feedback_string으로 넣어주는 반복 루프를 구현하면 된다.
스케줄링 봇이나 여행 플래너 앱에 적용할 때, system prompt에 few-shot 예시를 넣고 instruction prompt에 현재 플랜 + 실패한 제약 조건 목록을 넣은 뒤 T번 반복하는 구조로 구현하면 된다.

Code Example

snippet

import openai
import json

def evaluate_plan(plan: str, constraints: dict) -> tuple[float, list[str]]:
    """플랜이 제약 조건을 얼마나 충족하는지 평가. 위반 항목 리스트 반환"""
    violations = []
    # 여기에 도메인별 제약 검사 로직 구현
    cost = len(violations)  # 위반 수를 비용으로 사용
    return cost, violations

def llmpc_plan(
    task: str,
    constraints: dict,
    system_prompt: str,
    max_iterations: int = 7,
    plans_per_iter: int = 3,
    model: str = "gpt-4o"
) -> str:
    client = openai.OpenAI()
    best_plan = ""
    best_cost = float("inf")
    feedback_string = ""

    for step in range(1, max_iterations + 1):
        # 현재 상태 + 피드백을 넣어 K개 후보 요청
        instruction = f"""
STEP {step}/{max_iterations}
TASK: {task}

Your current best plan is:
{best_plan if best_plan else 'No plan yet.'}

{feedback_string}

Propose {plans_per_iter} different plans separated by '---'.
"""
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": instruction}
            ]
        )
        
        raw = response.choices[0].message.content
        candidates = raw.split("---")
        
        # 각 후보를 평가해서 최선 선택 (MPC의 비용 함수 선택 단계)
        for candidate in candidates:
            candidate = candidate.strip()
            if not candidate:
                continue
            cost, violations = evaluate_plan(candidate, constraints)
            if cost < best_cost:
                best_cost = cost
                best_plan = candidate
                feedback_string = "Unmet constraints:\n" + "\n".join(f"- {v}" for v in violations) if violations else ""
        
        if best_cost == 0:
            print(f"완벽한 플랜을 {step}번째 반복에서 찾음!")
            break
    
    return best_plan

# 사용 예시
task = "Visit 5 cities in 10 days with given flight constraints..."
constraints = {"days": 10, "cities": ["Paris", "London", "Rome", "Berlin", "Madrid"]}
system_prompt = "You are an expert travel planner. Propose valid trip plans."
result = llmpc_plan(task, constraints, system_prompt, max_iterations=7, plans_per_iter=3)

Terminology

MPCModel Predictive Control의 약자. 로봇/자동차 제어에서 쓰는 기법으로, 미래 여러 스텝을 미리 시뮬레이션해서 가장 좋은 행동 순서를 고르는 방식. 체스에서 몇 수 앞을 내다보는 것과 비슷.

비용 함수 (cost function)얼마나 좋은 플랜인지를 숫자로 표현하는 함수. 점수가 낮을수록 좋은 플랜. 제약 조건 위반 수나 목표까지의 거리 등을 수치화한 것.

few-shot promptingLLM에게 예제 몇 개를 보여주고 그 패턴대로 답하게 하는 기법. 교과서 예제 풀이 보여주고 새 문제 풀게 하는 것과 같음.

iterative refinement첫 번째 답이 틀리면 왜 틀렸는지 피드백을 줘서 다시 시도하게 하는 반복 개선 과정. 코드 리뷰 받고 수정하는 과정과 유사.

planning horizon (H)몇 스텝 앞까지 미리 계획할지의 범위. H=3이면 지금부터 3단계 앞까지만 계획하고 실행 후 다시 계획하는 방식.

CVXPY파이썬에서 수학적 최적화 문제를 푸는 라이브러리. 정확한 최적해를 계산하는 도구로 이 논문에서는 LLM과 성능 비교의 기준(baseline)으로 사용됨.

Natural Plan benchmarkLLM의 플래닝 능력을 평가하는 공개 벤치마크 데이터셋. 여행 일정 짜기, 회의 스케줄 짜기 같은 현실적인 플래닝 문제들로 구성됨.

Related Resources

Original Abstract (Expand)

Recent advancements in planning prompting techniques for Large Language Models have improved their reasoning, planning, and action abilities. This paper develops a planning framework for Large Language Models using model predictive control that enables them to iteratively solve complex problems with long horizons. We show that in the model predictive control formulation, LLM planners act as approximate cost function optimizers and solve complex problems by breaking them down into smaller iterative steps. With our proposed planning framework, we demonstrate improved performance over few-shot prompting and improved efficiency over Monte Carlo Tree Search on several planning benchmarks.