PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time

Jun 6, 2025•Weizhi Zhang, Xinyang Zhang, Chenwei Zhang +12•View PDF

TL;DR Highlight

A framework that optimizes per-user 'Personas' (system prompts) in real-time to personalize AI agent responses.

Who Should Read

Backend/AI developers adding per-user customized response capabilities to LLM chatbots or agents. Especially useful for those building recommendation, search, or Q&A services that leverage user history.

Core Mechanics

Auto-generates a unique system prompt (Persona) per user, then updates it in real-time at test time using the gap between expected and actual responses as a 'text gradient'
Combines Episodic Memory (raw interaction logs) with Semantic Memory (summarized user profiles) for user context management
Persona serves as a bridge between memory and action selection, enabling coherent personalization across different tasks

Evidence

LaMP-1 (paper citation prediction) Accuracy: PersonaAgent 0.919 vs 2nd place MemBank 0.862 (~6.6pp improvement)
LaMP-3 (product rating prediction) MAE: PersonaAgent 0.241 vs 2nd place ICL 0.277 (~13% improvement); RMSE: 0.509 vs 0.543
Ablation shows removing the Action module causes the biggest performance drop

How to Apply

For existing RAG chatbots: maintain a separate system prompt file per user, and add a loop that feeds the expected-vs-actual response gap from the last N interactions (paper uses 3) to an LLM to update the system prompt.
If you have a user profile DB: use existing profiles as Semantic Memory initialization, and add Episodic Memory from session interactions for finer-grained personalization.

Code Example

snippet

# PersonaAgent core loop implementation sketch (LangChain-based)

from langchain.chat_models import ChatAnthropic
from langchain.vectorstores import FAISS
from langchain.embeddings import BedrockEmbeddings

# 1. Initialize Persona (system prompt) per user
def init_persona(user_profile_summary: str) -> str:
    return f"""You are a helpful personalized assistant.
User summary: {user_profile_summary}
STRICT RULES:
1. Think step-by-step about what information you need.
2. MUST use at least TWO tools to answer the question.
3. Provide clear, concise responses."""

# 2. Episodic Memory storage and retrieval
class EpisodicMemory:
    def __init__(self, embeddings):
        self.store = {}  # user_id -> FAISS index
        self.embeddings = embeddings

    def add(self, user_id, query, response, metadata={}):
        # In practice, store in FAISS or another vector DB
        pass

    def retrieve(self, user_id, query, top_k=4):
        # Return Top-K interactions based on similarity
        pass

# 3. Core logic for Test-Time Persona optimization
def compute_textual_gradient(llm, question, agent_response, ground_truth):
    """Have the LLM analyze response differences to generate feedback"""
    prompt = f"""You are a meticulous evaluator of personalized AI agent responses.
Analyze the following and give feedback on how to improve the system prompt.

Question: {question}
Expected Answer: {ground_truth}
Agent Response: {agent_response}

Feedback should focus on:
1. How to improve search keywords for this user.
2. User's prior interactions and preferences.
3. Explicit user profile descriptions not specific to this task.

Feedback:"""
    return llm.predict(prompt)

def update_persona(llm, current_persona, feedbacks):
    """Aggregate feedback and update Persona"""
    aggregated = "\n".join(feedbacks)
    prompt = f"""You are a prompt engineering assistant.
Current system prompt: {current_persona}
Provided Feedback: {aggregated}

Generate an updated system prompt that highlights the user's unique preferences.
New system prompt:"""
    return llm.predict(prompt)

# 4. Test-time alignment loop
def test_time_alignment(llm, agent, persona, recent_interactions, n_iter=1):
    for _ in range(n_iter):
        feedbacks = []
        for q, gt in recent_interactions:
            agent_resp = agent.run(q, system_prompt=persona)
            feedback = compute_textual_gradient(llm, q, agent_resp, gt)
            feedbacks.append(feedback)
        persona = update_persona(llm, persona, feedbacks)
    return persona

Terminology

Episodic MemoryInspired by human episodic memory. Logs of specific past interactions with timestamps — records 'when and what happened', like a diary.

Semantic MemoryInspired by human semantic memory. Not specific events but abstracted user traits/preferences like 'this user likes sci-fi'. A user profile summary.

Related Resources

Original Abstract (Expand)

Large Language Model (LLM) empowered agents have recently emerged as advanced paradigms that exhibit impressive capabilities in a wide range of domains and tasks. Despite their potential, current LLM agents often adopt a one-size-fits-all approach, lacking the flexibility to respond to users' varying needs and preferences. This limitation motivates us to develop PersonaAgent, the first personalized LLM agent framework designed to address versatile personalization tasks. Specifically, PersonaAgent integrates two complementary components - a personalized memory module that includes episodic and semantic memory mechanisms; a personalized action module that enables the agent to perform tool actions tailored to the user. At the core, the persona (defined as unique system prompt for each user) functions as an intermediary: it leverages insights from personalized memory to control agent actions, while the outcomes of these actions in turn refine the memory. Based on the framework, we propose a test-time user-preference alignment strategy that simulate the latest n interactions to optimize the persona prompt, ensuring real-time user preference alignment through textual loss feedback between simulated and ground-truth responses. Experimental evaluations demonstrate that PersonaAgent significantly outperforms other baseline methods by not only personalizing the action space effectively but also scaling during test-time real-world applications. These results underscore the feasibility and potential of our approach in delivering tailored, dynamic user experiences.