Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents
TL;DR Highlight
A framework that stores and retrieves LLM agent memory based on 'actual event occurrence time' rather than 'conversation date', improving personalization accuracy by up to 12.2%
Who Should Read
Backend/AI developers integrating long-term memory (Mem0, Zep, etc.) into LLM agents — especially those struggling with time-based queries like 'What did I do last week?' or 'Where was the hotel I booked back then?'
Core Mechanics
- Two core problems with existing memory systems: storing memories with incorrect timestamps when conversation date differs from actual event date (Temporal inaccuracy), and storing continuous experiences only as fragmented points (Temporal fragmentation)
- TSM first builds a Temporal Knowledge Graph (TKG) based on event occurrence time, then slices it monthly and applies GMM clustering to generate 'topic' and 'persona' summaries
- At query time, spaCy parses relative time expressions like 'last weekend' into actual date ranges, then prioritizes retrieving only memories that fall within that time range
- Memory updates follow two stages: lightweight TKG graph-only updates on every new conversation turn (online), and costly summary regeneration handled monthly during sleep-time (offline)
- Achieved 74.8% on LONGMEMEVAL_S with GPT-4o-mini, a +12.2%p improvement over the previous best A-MEM (62.6%), with a particularly notable +22.56%p gain on temporally-grounded questions
Evidence
- LONGMEMEVAL_S: TSM 74.80% vs A-MEM 62.60% vs Zep 60.20% (GPT-4o-mini baseline)
- Temporal reasoning category: TSM 69.92% vs A-MEM 47.36% — a remarkable +22.56%p gap
- Multi-Session category: TSM 69.17% vs A-MEM 48.87% — +20.30%p gap
- Best among memory-based methods on LOCOMO dataset: TSM 76.69% vs Mem0g 68.44% vs Naive RAG 63.64%
How to Apply
- If you're storing conversation content with a simple timestamp (like Mem0 or Zep), consider switching to a method where you first use an LLM to extract 'the actual time the event occurred' before storing, and record it as a separate semantic_time field
- If a search query contains temporal expressions ('last week', 'last summer'), use spaCy's temporal expression parser to extract an absolute date range, then apply time-range matching as the primary filter — ahead of vector similarity scores
- If the same topic (travel, hobbies, etc.) recurs across consecutive conversations, apply monthly GMM clustering followed by LLM-based summarization — this creates 'durative memory' that improves personalized response quality beyond storing fragmented facts
Code Example
# Example of parsing temporal constraints from a query using spaCy
import spacy
from datetime import datetime, timedelta
nlp = spacy.load("en_core_web_sm")
def parse_semantic_time(query: str, now: datetime) -> tuple[datetime, datetime]:
"""
Extracts temporal expressions from a query and converts them to an actual date range
e.g., 'what did I eat last weekend' -> (2026-03-07, 2026-03-08)
"""
doc = nlp(query)
# Extract DATE/TIME entities using spaCy
for ent in doc.ents:
if ent.label_ in ["DATE", "TIME"]:
print(f"Detected temporal expression: {ent.text}")
# In practice, use dateparser, duckling, etc.
# https://github.com/scrapinghub/dateparser
return (now - timedelta(days=7), now)
# Apply temporal constraints as the primary filter during memory retrieval
def temporal_rerank(candidates: list, time_range: tuple, query_embedding) -> list:
"""
Re-ranks results with time-range matching as the primary criterion and semantic similarity as secondary
"""
def score(mem):
in_time = mem["semantic_time"] and \
time_range[0] <= mem["semantic_time"] <= time_range[1]
sem_score = cosine_similarity(query_embedding, mem["embedding"])
# Sort descending by (time match flag, semantic similarity) tuple
return (int(in_time), sem_score)
return sorted(candidates, key=score, reverse=True)Terminology
Related Resources
Original Abstract (Expand)
Memory enables Large Language Model (LLM) agents to perceive, store, and use information from past dialogues, which is essential for personalization. However, existing methods fail to properly model the temporal dimension of memory in two aspects: 1) Temporal inaccuracy: memories are organized by dialogue time rather than their actual occurrence time; 2) Temporal fragmentation: existing methods focus on point-wise memory, losing durative information that captures persistent states and evolving patterns. To address these limitations, we propose Temporal Semantic Memory (TSM), a memory framework that models semantic time for point-wise memory and supports the construction and utilization of durative memory. During memory construction, it first builds a semantic timeline rather than a dialogue one. Then, it consolidates temporally continuous and semantically related information into a durative memory. During memory utilization, it incorporates the query's temporal intent on the semantic timeline, enabling the retrieval of temporally appropriate durative memories and providing time-valid, duration-consistent context to support response generation. Experiments on LongMemEval and LoCoMo show that TSM consistently outperforms existing methods and achieves up to 12.2% absolute improvement in accuracy, demonstrating the effectiveness of the proposed method.