LLM을 위한 Y-Combinator: λ-Calculus로 Long-Context Rot 해결하기

The $\mathbf{Y}$-Combinator for LLMs: Solving Long-Context Rot with $λ$-Calculus

Mar 20, 2026•Amartya Roy, Rasul Tutunov, Xiaotong Ji +2•View PDF

TL;DR Highlight

LLM이 직접 재귀 코드를 짜는 대신, λ-calculus 기반의 결정론적 combinator(SPLIT/MAP/FILTER/REDUCE)로 긴 문서를 처리하면 정확도 +21.9%, 속도 4.1배 빠른 프레임워크.

Who Should Read

128K 토큰을 넘는 긴 문서 처리나 대규모 코드베이스 분석 파이프라인을 구축하는 백엔드/ML 엔지니어. LLM 에이전트의 비결정적 실행 흐름 문제로 골치를 썩고 있는 개발자.

Core Mechanics

기존 RLM(Recursive Language Model)은 LLM이 직접 Python 코드를 생성하는 오픈엔드 REPL 루프를 써서 무한루프, 코드 파싱 오류, 예측 불가한 비용이 발생하는 문제가 있음
λ-RLM은 LLM을 '작은 청크에만 쓰는 oracle'로 제한하고, SPLIT/MAP/FILTER/REDUCE 같은 사전 검증된 combinator가 모든 제어 흐름을 담당 — LLM이 코드를 쓰지 않아도 됨
수학적으로 종료 보장, 비용 상한(closed-form), 정확도 스케일링 법칙을 증명 — 기존 RLM에는 없던 공식 보증
8B 모델 + λ-RLM이 70B 모델 + 일반 RLM과 동등한 정확도(35.7% vs 36.1%)를 달성하면서 3.1배 빠름 — 모델 크기를 구조로 대체 가능
O(n²) 쌍 비교 작업(OOL-Pairs)에서 +28.6pp 정확도 향상, 6.2배 속도 향상 — 심볼릭 CROSS 연산으로 신경망 호출 없이 쌍을 계산하기 때문
강력한 코드 생성 모델(Llama-405B, Codestral-22B)이나 자유로운 코드 탐색이 필요한 CodeQA 태스크에서는 일반 RLM이 여전히 우세 — combinator 라이브러리의 한계

Evidence

9개 모델 × 4개 태스크 = 36개 비교에서 λ-RLM이 29/36(81%) 승리, 약한 모델(8B/7B) 티어에서는 12/12(100%) 완승
평균 정확도 향상: Weak 모델 +21.9pp(13.8% → 35.7%), Medium 모델 +18.6pp(31.3% → 50.1%), Strong 모델 +8.8pp(50.1% → 58.9%)
레이턴시 감소: Weak 모델 4.1배(229s → 57s), Medium 4.1배(193s → 47s), Strong 3.3배(129s → 39s) — 일반 RLM 대비
Ablation: combinator 라이브러리를 자유형 코드 생성으로 교체 시 정확도 -24.2pp, 레이턴시 3.9배 증가 (Qwen3-8B × OOLONG 기준)

How to Apply

긴 문서 처리 파이프라인에서 LLM이 '어떻게 쪼갤지'를 스스로 결정하게 하는 대신, SPLIT → MAP(M) → REDUCE 패턴의 사전 정의된 실행 계획을 태스크 유형별로 만들어 두면 됨 (검색: FILTER 추가, 집계: MERGE, 쌍비교: CROSS 활용)
청크 크기 k는 k*=2가 수학적 최적이므로, 불필요하게 많은 청크로 분할하는 대신 이진 분할(binary split)을 기본 전략으로 채택하고 필요 시 늘리는 방식으로 비용을 최소화할 수 있음
에이전트 시스템에서 LLM에게 제어 흐름을 맡기는 구조라면, 'LLM은 leaf 노드에서만 의미 추론, 나머지 오케스트레이션은 결정론적 코드'로 역할을 분리하면 종료 보장과 비용 예측 가능성을 얻을 수 있음

Code Example

snippet

# λ-RLM 핵심 패턴 - 직접 구현 예시
# github.com/lambda-calculus-LLM/lambda-RLM 참고

from typing import List, Callable

def split(text: str, k: int) -> List[str]:
    """SPLIT combinator: 텍스트를 k개 청크로 균등 분할"""
    chunk_size = len(text) // k
    return [text[i*chunk_size:(i+1)*chunk_size] for i in range(k)]

def lambda_rlm(
    prompt: str,
    model_fn: Callable[[str], str],  # leaf에서만 호출되는 LLM
    context_window: int = 32000,
    k_star: int = 2,  # Theorem 4: 최적 partition size = 2
    task_type: str = "aggregate"
) -> str:
    """λ-RLM 재귀 실행기 (Y-combinator 패턴)"""
    tau_star = context_window  # leaf threshold
    
    def phi(P: str) -> str:  # 재귀 executor
        if len(P) <= tau_star:
            # Base case: LLM 직접 호출 (유일한 신경망 연산)
            return model_fn(P)
        else:
            # Recursive case: 순수 심볼릭 연산
            chunks = split(P, k_star)           # SPLIT - 결정론적
            results = [phi(chunk) for chunk in chunks]  # MAP - 재귀
            return reduce_by_task(results, task_type)   # REDUCE - 결정론적
    
    return phi(prompt)

def reduce_by_task(results: List[str], task_type: str) -> str:
    """태스크별 composition operator"""
    if task_type == "aggregate":
        return "\n".join(results)  # MERGE
    elif task_type == "search":
        return max(results, key=lambda x: len(x))  # BEST
    elif task_type == "summarize":
        return "\n".join(results)  # CONCAT (이후 상위 M 호출)
    return "\n".join(results)

# 사용 예시
import openai
def call_llm(text: str) -> str:
    resp = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": text}]
    )
    return resp.choices[0].message.content

# 128K 토큰 문서를 2K context 모델로 처리
long_doc = "..." * 50000  # 매우 긴 문서
result = lambda_rlm(
    prompt=long_doc,
    model_fn=call_llm,
    context_window=2000,
    k_star=2,  # 최적값
    task_type="aggregate"
)

Terminology

λ-calculus (람다 대수)함수와 함수 적용만으로 모든 계산을 표현하는 수학적 언어. 프로그래밍 언어의 이론적 기초로, 여기서는 LLM의 재귀 실행 구조를 형식화하는 데 사용됨.

Y-combinator이름 없는 함수가 스스로를 재귀 호출할 수 있게 해주는 수학적 트릭. 파이썬처럼 함수 이름으로 자기 자신을 부를 수 없을 때, Y를 적용하면 재귀가 가능해짐.

RLM (Recursive Language Model)LLM이 REPL 환경에서 직접 Python 코드를 써서 긴 프롬프트를 재귀적으로 분해하는 방식. 강력하지만 코드 오류나 무한루프 위험이 있음.

REPL (Read-Eval-Print Loop)코드를 입력하면 즉시 실행해주는 인터랙티브 환경. Python 콘솔이나 Jupyter 노트북이 대표적 예시.

combinator입력을 받아 출력을 내놓는 순수 함수. λ-RLM에서 SPLIT, MAP, FILTER, REDUCE 등이 이에 해당하며, 모두 결정론적이고 사전 검증됨.

context rot (컨텍스트 부패)입력이 길어질수록 LLM 정확도가 지수적으로 떨어지는 현상. 책 뒷부분을 읽을수록 앞 내용을 잊어버리는 것과 비슷.

fixed-point combinator함수 f에 대해 fix(f) = f(fix(f))를 만족하는 고차 함수. 이를 통해 익명 함수도 재귀 호출이 가능해짐. Y-combinator가 대표적 구현체.

context windowLLM이 한 번에 처리할 수 있는 최대 토큰 수. GPT-4의 경우 128K, 작은 모델은 4K~32K 정도로 제한됨.

Related Resources

Original Abstract (Expand)

LLMs are increasingly used as general-purpose reasoners, but long inputs remain bottlenecked by a fixed context window. Recursive Language Models (RLMs) address this by externalising the prompt and recursively solving subproblems. Yet existing RLMs depend on an open-ended read-eval-print loop (REPL) in which the model generates arbitrary control code, making execution difficult to verify, predict, and analyse. We introduce $λ$-RLM, a framework for long-context reasoning that replaces free-form recursive code generation with a typed functional runtime grounded in $λ$-calculus. It executes a compact library of pre-verified combinators and uses neural inference only on bounded leaf subproblems, turning recursive reasoning into a structured functional program with explicit control flow. We show that $λ$-RLM admits formal guarantees absent from standard RLMs, including termination, closed-form cost bounds, controlled accuracy scaling with recursion depth, and an optimal partition rule under a simple cost model. Empirically, across four long-context reasoning tasks and nine base models, $λ$-RLM outperforms standard RLM in 29 of 36 model-task comparisons, improves average accuracy by up to +21.9 points across model tiers, and reduces latency by up to 4.1x. These results show that typed symbolic control yields a more reliable and efficient foundation for long-context reasoning than open-ended recursive code generation. The complete implementation of $λ$-RLM, is open-sourced for the community at: https://github.com/lambda-calculus-LLM/lambda-RLM.