Chronos: Long-Term Memory를 위한 시간 인식 대화 에이전트와 Structured Event Retrieval

Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory

Mar 17, 2026•Sahil Sen, Elias Lumer, Anmol Gulati +1•View PDF

TL;DR Highlight

대화 기록에서 시간 기반 이벤트를 구조화해 '지난달에 뭐 했어?'같은 질문을 95.6% 정확도로 답하는 메모리 프레임워크

Who Should Read

챗봇이나 AI 어시스턴트에 장기 대화 메모리를 붙이려는 백엔드 개발자. 특히 '사용자가 언제 뭘 했는지' 추적하는 개인화 기능을 만드는 상황.

Core Mechanics

대화를 그냥 저장하는 대신 <주어, 동사, 목적어> + 날짜 범위로 파싱한 'event calendar'와 원본 대화 'turn calendar' 두 개를 동시에 유지
ISO 8601 datetime range로 시간 표현 정규화 — '최근에', '지난달'같은 애매한 표현을 실제 날짜 범위로 변환해서 필터링 가능하게 만듦
쿼리마다 동적으로 retrieval 가이드 생성하는 'dynamic prompting' — '가장 최근에 산 카메라 렌즈는?' 같은 질문을 분석해서 에이전트에게 무엇을 어떻게 검색할지 지시
ReAct 패턴 기반 에이전트가 vector search + grep 두 가지 도구를 iterative하게 호출해서 증거 부족하면 추가 검색
GPT-4o 기준 LongMemEvalS 벤치마크에서 92.60% (Chronos Low), Claude Opus 4.6 기준 95.60% (Chronos High) 달성
이벤트 캘린더 제거 시 정확도 34.5포인트 하락 — 시간 구조화가 핵심 성능 요인임을 ablation으로 확인

Evidence

Chronos Low(GPT-4o) 92.60% vs 기존 최고 EmergenceMem Internal 86.00% — 7.67% 절대적 향상
Chronos High(Claude Opus 4.6) 95.60%, 기존 최고 대비 3.02% 향상으로 LongMemEvalS SOTA 달성
ablation: events calendar 제거 시 93.1% → 58.6%로 34.5포인트 폭락, 단일 컴포넌트 중 가장 큰 기여
multi-session aggregation(여러 세션 집계) 카테고리에서 91.73% — 2위 Honcho 대비 7.97% 상대적 향상

How to Apply

대화 저장 시 LLM으로 <subject, verb, object, start_datetime, end_datetime> 튜플을 추출해 별도 인덱스에 저장. '지난주', '최근' 같은 표현은 conversation timestamp 기준으로 날짜 범위 계산해서 채워넣기.
쿼리 처리 전에 'dynamic prompting' 단계 추가 — 질문을 Gemini Flash같은 경량 모델에 넣어 '어떤 정보를 어떤 시간 범위로 검색해야 하는지' 가이드 생성 후 에이전트 시스템 프롬프트에 주입.
검색 도구를 vector search(의미 기반)와 grep(정확 키워드 매칭) 두 개 모두 에이전트에 제공. 에이전트가 상황에 따라 선택하게 하면 특정 제품명이나 날짜 같은 exact match에서 recall 크게 향상.

Code Example

snippet

# Event extraction prompt 예시
EVENT_EXTRACTION_PROMPT = """
Given the conversation turn below (timestamp: {tconv}), extract all temporally-grounded events.
For each event, output JSON with:
- subject: who/what
- verb: action
- object: what was acted upon  
- start_datetime: ISO 8601 (earliest possible)
- end_datetime: ISO 8601 (latest possible)
- aliases: 2-4 paraphrases using different vocabulary

Rules:
- 'recently' → compute window relative to {tconv}
- 'last month' → first to last day of previous month from {tconv}
- Only extract events with clear subject+verb+object

Conversation turn:
{turn_text}
"""

# Dynamic prompting 예시
DYNAMIC_PROMPT_META = """
Analyze this memory query and output 1-5 bullet points describing:
- What specific information to retrieve
- What time ranges to filter by
- How to approach multi-hop reasoning if needed

Query: {user_query}
Current date: {current_date}

Output format:
Pay close attention to the following information (current and past):
• [bullet 1]
• [bullet 2]
...
"""

# Agent tool definitions
tools = [
    {"name": "search_events", "description": "Semantic search over event calendar. Use for time-grounded queries."},
    {"name": "search_turns", "description": "Semantic search over raw conversation turns."},
    {"name": "grep_events", "description": "Exact keyword search on event calendar."},
    {"name": "grep_turns", "description": "Exact keyword search on conversation turns."},
]

Terminology

ReActLLM이 '생각 → 도구 호출 → 결과 확인 → 다시 생각'을 반복하며 문제를 푸는 패턴. 사람이 모르는 것을 검색하고 읽고 판단하는 것과 같음.

event calendar대화에서 '언제 무슨 일이 있었는지'를 구조화해 저장한 DB. 'John이 2024-03-15에 러닝화를 샀다'처럼 날짜와 행동이 연결된 레코드.

turn calendar원본 대화 텍스트를 세션별로 저장한 DB. event calendar와 쌍을 이루며 의미 검색용으로 사용.

dynamic prompting질문마다 다른 검색 가이드를 자동 생성하는 기법. '이 질문엔 3월 날짜 필터링이 필요하다'처럼 에이전트에게 맞춤 지시를 만들어 주입.

multi-hop reasoning하나의 질문에 답하려고 여러 단계의 검색과 추론이 필요한 것. '휴가 다음 주에 뭐 했어?' = 먼저 휴가 날짜 찾기 → 그 다음 주 이벤트 검색.

ISO 8601날짜/시간 국제 표준 형식. '2024-03-15T09:00:00Z' 같은 형태로, 모든 시스템이 같은 방식으로 날짜를 이해할 수 있게 해줌.

cross-encoder reranking검색된 후보들을 다시 한 번 모델이 정밀하게 점수 매기는 과정. 빠른 1차 검색으로 100개 뽑고, 느리지만 정확한 모델로 15개 추리는 2단계 방식.

LongMemEvalS수개월치 대화에서 시간 추론, 다중 세션 집계 등 6가지 카테고리로 장기 메모리 성능을 평가하는 벤치마크. 500문항.

Related Resources

Original Abstract (Expand)

Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems struggle to reason over temporally grounded facts and preferences that evolve across months of interaction and lack effective retrieval strategies for multi-hop, time-sensitive queries over long dialogue histories. We introduce Chronos, a novel temporal-aware memory framework that decomposes raw dialogue into subject-verb-object event tuples with resolved datetime ranges and entity aliases, indexing them in a structured event calendar alongside a turn calendar that preserves full conversational context. At query time, Chronos applies dynamic prompting to generate tailored retrieval guidance for each question, directing the agent on what to retrieve, how to filter across time ranges, and how to approach multi-hop reasoning through an iterative tool-calling loop over both calendars. We evaluate Chronos with 8 LLMs, both open-source and closed-source, on the LongMemEvalS benchmark comprising 500 questions spanning six categories of dialogue history tasks. Chronos Low achieves 92.60% and Chronos High scores 95.60% accuracy, setting a new state of the art with an improvement of 7.67% over the best prior system. Ablation results reveal the events calendar accounts for a 58.9% gain on the baseline while all other components yield improvements between 15.5% and 22.3%. Notably, Chronos Low alone surpasses prior approaches evaluated under their strongest model configurations.