Chronos: Temporal-Aware Conversational Agents with Structured Event Retrieval for Long-Term Memory
TL;DR Highlight
A memory framework that structures time-based events from conversation history to answer questions like 'what did I do last month?' with 95.6% accuracy
Who Should Read
Backend developers adding long-term conversation memory to chatbots or AI assistants. Especially those building personalization features that track 'when users did what.'
Core Mechanics
- Instead of just storing conversations, maintains two structures simultaneously: an 'event calendar' with <subject, verb, object> + date ranges, and a 'turn calendar' with original conversation text
- Normalizes time expressions to ISO 8601 datetime ranges — converts vague expressions like 'recently' and 'last month' into actual date ranges for filtering
- Dynamic prompting generates retrieval guides per query — analyzes questions like 'what's the most recent camera lens I bought?' and instructs the agent on what and how to search
- ReAct-pattern agent iteratively calls two tools (vector search + grep), performing additional searches when evidence is insufficient
- Achieved 92.60% on LongMemEvalS benchmark with GPT-4o (Chronos Low), 95.60% with Claude Opus 4.6 (Chronos High)
- Removing event calendar drops accuracy by 34.5 points — ablation confirms time structuring is the key performance driver
Evidence
- Chronos Low (GPT-4o) 92.60% vs previous best EmergenceMem Internal 86.00% — 7.67% absolute improvement
- Chronos High (Claude Opus 4.6) 95.60%, +3.02% over previous best for LongMemEvalS SOTA
- Ablation: removing events calendar drops from 93.1% to 58.6% (34.5pp drop) — single largest component contribution
- Multi-session aggregation category: 91.73% — 7.97% relative improvement over 2nd place Honcho
How to Apply
- When storing conversations, use an LLM to extract <subject, verb, object, start_datetime, end_datetime> tuples into a separate index. Fill in date ranges for expressions like 'last week' or 'recently' by calculating from the conversation timestamp.
- Add a 'dynamic prompting' step before query processing — feed the question to a lightweight model like Gemini Flash to generate a guide on 'what info to search for in what time range,' then inject into the agent's system prompt.
- Provide agents with both vector search (semantic-based) and grep (exact keyword matching) tools. Letting the agent choose based on context dramatically improves recall on exact matches like specific product names or dates.
Code Example
# Event extraction prompt example
EVENT_EXTRACTION_PROMPT = """
Given the conversation turn below (timestamp: {tconv}), extract all temporally-grounded events.
For each event, output JSON with:
- subject: who/what
- verb: action
- object: what was acted upon
- start_datetime: ISO 8601 (earliest possible)
- end_datetime: ISO 8601 (latest possible)
- aliases: 2-4 paraphrases using different vocabulary
Rules:
- 'recently' → compute window relative to {tconv}
- 'last month' → first to last day of previous month from {tconv}
- Only extract events with clear subject+verb+object
Conversation turn:
{turn_text}
"""
# Dynamic prompting example
DYNAMIC_PROMPT_META = """
Analyze this memory query and output 1-5 bullet points describing:
- What specific information to retrieve
- What time ranges to filter by
- How to approach multi-hop reasoning if needed
Query: {user_query}
Current date: {current_date}
Output format:
Pay close attention to the following information (current and past):
• [bullet 1]
• [bullet 2]
...
"""
# Agent tool definitions
tools = [
{"name": "search_events", "description": "Semantic search over event calendar. Use for time-grounded queries."},
{"name": "search_turns", "description": "Semantic search over raw conversation turns."},
{"name": "grep_events", "description": "Exact keyword search on event calendar."},
{"name": "grep_turns", "description": "Exact keyword search on conversation turns."},
]Terminology
Related Resources
Original Abstract (Expand)
Recent advances in Large Language Models (LLMs) have enabled conversational AI agents to engage in extended multi-turn interactions spanning weeks or months. However, existing memory systems struggle to reason over temporally grounded facts and preferences that evolve across months of interaction and lack effective retrieval strategies for multi-hop, time-sensitive queries over long dialogue histories. We introduce Chronos, a novel temporal-aware memory framework that decomposes raw dialogue into subject-verb-object event tuples with resolved datetime ranges and entity aliases, indexing them in a structured event calendar alongside a turn calendar that preserves full conversational context. At query time, Chronos applies dynamic prompting to generate tailored retrieval guidance for each question, directing the agent on what to retrieve, how to filter across time ranges, and how to approach multi-hop reasoning through an iterative tool-calling loop over both calendars. We evaluate Chronos with 8 LLMs, both open-source and closed-source, on the LongMemEvalS benchmark comprising 500 questions spanning six categories of dialogue history tasks. Chronos Low achieves 92.60% and Chronos High scores 95.60% accuracy, setting a new state of the art with an improvement of 7.67% over the best prior system. Ablation results reveal the events calendar accounts for a 58.9% gain on the baseline while all other components yield improvements between 15.5% and 22.3%. Notably, Chronos Low alone surpasses prior approaches evaluated under their strongest model configurations.