Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents

Jan 12, 2026•Miao Su, Yucan Guo, Zhongni Hou +8•View PDF

TL;DR Highlight

A framework that stores and retrieves LLM agent memory based on 'actual event occurrence time' rather than 'conversation date', improving personalization accuracy by up to 12.2%

Who Should Read

Backend/AI developers integrating long-term memory (Mem0, Zep, etc.) into LLM agents — especially those struggling with time-based queries like 'What did I do last week?' or 'Where was the hotel I booked back then?'

Core Mechanics

Two core problems with existing memory systems: storing memories with incorrect timestamps when conversation date differs from actual event date (Temporal inaccuracy), and storing continuous experiences only as fragmented points (Temporal fragmentation)
TSM first builds a Temporal Knowledge Graph (TKG) based on event occurrence time, then slices it monthly and applies GMM clustering to generate 'topic' and 'persona' summaries
At query time, spaCy parses relative time expressions like 'last weekend' into actual date ranges, then prioritizes retrieving only memories that fall within that time range
Memory updates follow two stages: lightweight TKG graph-only updates on every new conversation turn (online), and costly summary regeneration handled monthly during sleep-time (offline)
Achieved 74.8% on LONGMEMEVAL_S with GPT-4o-mini, a +12.2%p improvement over the previous best A-MEM (62.6%), with a particularly notable +22.56%p gain on temporally-grounded questions

Evidence

LONGMEMEVAL_S: TSM 74.80% vs A-MEM 62.60% vs Zep 60.20% (GPT-4o-mini baseline)
Temporal reasoning category: TSM 69.92% vs A-MEM 47.36% — a remarkable +22.56%p gap
Multi-Session category: TSM 69.17% vs A-MEM 48.87% — +20.30%p gap
Best among memory-based methods on LOCOMO dataset: TSM 76.69% vs Mem0g 68.44% vs Naive RAG 63.64%

How to Apply

If you're storing conversation content with a simple timestamp (like Mem0 or Zep), consider switching to a method where you first use an LLM to extract 'the actual time the event occurred' before storing, and record it as a separate semantic_time field
If a search query contains temporal expressions ('last week', 'last summer'), use spaCy's temporal expression parser to extract an absolute date range, then apply time-range matching as the primary filter — ahead of vector similarity scores
If the same topic (travel, hobbies, etc.) recurs across consecutive conversations, apply monthly GMM clustering followed by LLM-based summarization — this creates 'durative memory' that improves personalized response quality beyond storing fragmented facts

Code Example

snippet

# Example of parsing temporal constraints from a query using spaCy
import spacy
from datetime import datetime, timedelta

nlp = spacy.load("en_core_web_sm")

def parse_semantic_time(query: str, now: datetime) -> tuple[datetime, datetime]:
    """
    Extracts temporal expressions from a query and converts them to an actual date range
    e.g., 'what did I eat last weekend' -> (2026-03-07, 2026-03-08)
    """
    doc = nlp(query)
    # Extract DATE/TIME entities using spaCy
    for ent in doc.ents:
        if ent.label_ in ["DATE", "TIME"]:
            print(f"Detected temporal expression: {ent.text}")
    
    # In practice, use dateparser, duckling, etc.
    # https://github.com/scrapinghub/dateparser
    return (now - timedelta(days=7), now)

# Apply temporal constraints as the primary filter during memory retrieval
def temporal_rerank(candidates: list, time_range: tuple, query_embedding) -> list:
    """
    Re-ranks results with time-range matching as the primary criterion and semantic similarity as secondary
    """
    def score(mem):
        in_time = mem["semantic_time"] and \
                  time_range[0] <= mem["semantic_time"] <= time_range[1]
        sem_score = cosine_similarity(query_embedding, mem["embedding"])
        # Sort descending by (time match flag, semantic similarity) tuple
        return (int(in_time), sem_score)
    
    return sorted(candidates, key=score, reverse=True)

Terminology

Temporal Knowledge Graph (TKG)A graph database that stores relationships between concepts such as people, places, and events along with temporal information. It explicitly records when a relationship held — for example, 'the user arrived in Boston on 2024-05-03'.

Episodic MemoryMemory that stores specific events at a particular point in time — discrete facts like 'had ramen last Tuesday'. Corresponds to episodic memory in human cognition.

Durative MemoryMemory that captures ongoing states or patterns rather than one-off events — for example, 'the user is very interested in making cocktails'. It is summary information extracted across multiple episodes.

GMM (Gaussian Mixture Model)A statistical technique that automatically classifies data into multiple groups (clusters). Here it is used to group entities mentioned within the same month by topic. Similar to K-means clustering, but with soft, probabilistic boundaries.

Semantic TimelineA timeline ordered by the actual date events occurred, rather than the date the conversation took place. If a user says 'I'm going to Tokyo tomorrow', the memory is stored under tomorrow's date.

Point-wise MemoryAn approach that stores each fact as an independent point — e.g., 'did A on May 1st', 'did B on May 3rd' — with no connections between entries, causing continuity to be lost.

Naive RAGThe most basic form of RAG: documents are split into chunks, stored in a vector DB, and the chunks with the highest cosine similarity to the query are retrieved directly. No temporal information or relationships are considered.

Related Resources

Original Abstract (Expand)

Memory enables Large Language Model (LLM) agents to perceive, store, and use information from past dialogues, which is essential for personalization. However, existing methods fail to properly model the temporal dimension of memory in two aspects: 1) Temporal inaccuracy: memories are organized by dialogue time rather than their actual occurrence time; 2) Temporal fragmentation: existing methods focus on point-wise memory, losing durative information that captures persistent states and evolving patterns. To address these limitations, we propose Temporal Semantic Memory (TSM), a memory framework that models semantic time for point-wise memory and supports the construction and utilization of durative memory. During memory construction, it first builds a semantic timeline rather than a dialogue one. Then, it consolidates temporally continuous and semantically related information into a durative memory. During memory utilization, it incorporates the query's temporal intent on the semantic timeline, enabling the retrieval of temporally appropriate durative memories and providing time-valid, duration-consistent context to support response generation. Experiments on LongMemEval and LoCoMo show that TSM consistently outperforms existing methods and achieves up to 12.2% absolute improvement in accuracy, demonstrating the effectiveness of the proposed method.