LLM 기반 Agent의 Memory 메커니즘 종합 Survey

A Survey on the Memory Mechanism of Large Language Model-based Agents

Apr 21, 2024•Zeyu Zhang, Quanyu Dai, Xiaohe Bo +6•View PDF

TL;DR Highlight

LLM 에이전트가 어떻게 기억을 저장하고 활용하는지 총정리한 첫 번째 메모리 특화 서베이 논문

Who Should Read

LLM 에이전트를 개발하거나 설계하는 백엔드/AI 개발자, 특히 대화 맥락 유지나 장기 기억 구현 방식을 고민하는 사람. 멀티 에이전트 시스템 또는 개인 어시스턴트 구축을 검토 중인 팀에게 유용.

Core Mechanics

메모리 소스를 3가지로 분류: 현재 태스크 내 정보(inside-trial), 과거 태스크 경험(cross-trial), 외부 지식(Wikipedia·API 등)
메모리 저장 형태는 Textual(자연어)과 Parametric(파라미터에 내장) 두 가지로 나뉨. 텍스트는 해석 쉽고 수정 편하지만 컨텍스트 길이 제한에 걸리고, 파라미터는 밀도 높지만 업데이트 비용이 큼
메모리 운영 3단계: Writing(정보 저장) → Management(요약·병합·망각) → Reading(검색·활용). Generative Agents는 reflection으로 고차원 기억 생성, Reflexion은 실패 경험을 언어로 강화학습에 활용
현재 대부분 시스템은 텍스트 메모리 위주. FAISS 벡터 검색, SQL 기반 검색, LSH 등 다양한 retrieval 전략 사용 중
Reflexion, ExpeL, Synapse 같은 시스템은 cross-trial 정보(과거 실패/성공 경험)를 활용해 같은 실수 반복을 줄이는 방식으로 성능 향상
메모리 평가 방법: 직접 평가(정확도·F1 점수·응답 일관성)와 간접 평가(대화 완성도, 멀티소스 QA, 롱컨텍스트 태스크 성공률) 두 트랙으로 나뉨

Evidence

Voyager(Minecraft 에이전트)는 cross-trial 스킬 메모리를 활용해 탐색 아이템 수를 기존 대비 유의미하게 늘림(논문 내 ablation 비교)
Reflexion은 AlfWorld 태스크에서 메모리 없는 베이스라인 대비 성공률을 크게 향상(ReAct 대비 최대 20% 이상 개선 보고)
ExpeL은 실패/성공 trajectory를 비교해 패턴 추출 후 새 태스크에 적용, AlfWorld 등 복잡한 인터랙티브 태스크에서 성능 향상 확인
MemGPT는 운영체제의 가상 메모리 개념을 차용해 긴 대화에서도 컨텍스트 유지 가능, CSIM 점수 기준 사용자 참여도 향상 보고

How to Apply

대화 에이전트를 만들 때 전체 히스토리를 프롬프트에 다 넣지 말고, FAISS 같은 벡터 DB에 대화 세그먼트를 임베딩으로 저장한 뒤 현재 질문과 유사도가 높은 것만 top-K로 가져오는 Retrieved Memory 방식으로 바꿔보면 컨텍스트 길이 폭발 문제를 해결할 수 있다
에이전트가 실패했을 때 그냥 재시도하지 말고, Reflexion 방식처럼 실패 원인을 자연어로 요약해서 cross-trial 메모리에 저장하고 다음 시도 시 프롬프트에 포함시키면 같은 실수를 줄일 수 있다
도메인 특화 에이전트(의료·금융 등)를 만들 때는 외부 지식(Wikipedia, 전문 DB)을 API로 동적 호출하는 External Knowledge 방식과, 핵심 도메인 지식을 LoRA로 파인튜닝해 파라미터에 내장하는 방식을 조합하면 최신성과 전문성을 동시에 확보할 수 있다

Code Example

snippet

# Reflexion 스타일 cross-trial 메모리 패턴 예시

from openai import OpenAI

client = OpenAI()

# 실패 경험을 언어로 저장하는 메모리 저장소
cross_trial_memory = []

def reflect_on_failure(task, failed_trajectory, error_feedback):
    """실패 경험을 분석해 교훈을 추출"""
    prompt = f"""
    Task: {task}
    What I tried: {failed_trajectory}
    What went wrong: {error_feedback}
    
    In 2-3 sentences, what should I do differently next time?
    """
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

def run_agent_with_memory(task, max_trials=3):
    for trial in range(max_trials):
        # 과거 실패 경험을 프롬프트에 포함
        memory_context = ""
        if cross_trial_memory:
            memory_context = "Past experiences:\n" + "\n".join(
                [f"- {m}" for m in cross_trial_memory[-3:]]  # 최근 3개만
            )
        
        prompt = f"""
        {memory_context}
        
        Now solve this task: {task}
        """
        
        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        
        result = response.choices[0].message.content
        
        # 실패 시 reflection 후 메모리에 저장
        success = evaluate_result(result)  # 실제 평가 로직
        if not success:
            lesson = reflect_on_failure(task, result, "Task failed")
            cross_trial_memory.append(lesson)
            print(f"Trial {trial+1} failed. Learned: {lesson}")
        else:
            return result
    
    return "Max trials reached"

# Retrieved Memory 패턴 (FAISS 활용)
import faiss
import numpy as np

class AgentMemory:
    def __init__(self, embedding_dim=1536):
        self.index = faiss.IndexFlatL2(embedding_dim)
        self.memories = []
    
    def write(self, text, embedding):
        """메모리 저장"""
        self.index.add(np.array([embedding]))
        self.memories.append(text)
    
    def read(self, query_embedding, top_k=3):
        """관련 메모리 검색"""
        distances, indices = self.index.search(
            np.array([query_embedding]), top_k
        )
        return [self.memories[i] for i in indices[0] if i < len(self.memories)]

Terminology

cross-trial memory에이전트가 같은 태스크를 여러 번 시도하면서 쌓인 성공/실패 경험. 인간이 여러 번 실험하며 '이 방법은 안 되는구나'를 배우는 것과 같음.

Reflexion실패했을 때 그 이유를 자연어로 반성문처럼 써서 다음 시도에 참고하는 기법. 틀린 문제 풀이를 노트에 정리하고 다시 보는 것과 유사.

parametric memory지식을 별도 텍스트로 저장하지 않고 모델 파라미터(가중치) 안에 녹여넣는 방식. 마치 경험이 뇌에 자연스럽게 녹아드는 것처럼, 별도 저장공간 없이 모델 자체가 지식을 내포.

FAISSFacebook이 만든 고속 벡터 유사도 검색 라이브러리. 수백만 개의 임베딩 중에서 비슷한 것을 빠르게 찾아주는 색인 시스템.

Generative AgentsStanford 연구팀이 만든 시뮬레이션 에이전트 프레임워크. 25명의 AI 캐릭터가 마을에서 생활하며 기억·계획·반성을 통해 사람처럼 행동.

knowledge editing모델 전체를 다시 학습시키지 않고 특정 사실만 정밀하게 수정하는 기술. 책의 특정 페이지만 교체하는 것처럼, 나머지 지식은 건드리지 않고 필요한 부분만 업데이트.

MemGPT운영체제의 가상 메모리 개념을 LLM에 적용한 시스템. 자주 쓰는 정보는 '메인 메모리(컨텍스트 창)'에, 나머지는 '보조 저장소(외부 DB)'에 두고 필요할 때 스왑하는 방식.

inside-trial information에이전트가 현재 태스크를 수행하는 동안 발생한 행동과 환경 반응 기록. 지금 풀고 있는 문제의 풀이 과정 메모와 같음.

Related Resources

https://github.com/nuster1128/LLM_Agent_Memory_Survey

Original Abstract (Expand)

Large language model (LLM)-based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for solving real-world problems that need long-term and complex agent-environment interactions. The key component to support agent-environment interactions is the memory of the agents. While previous studies have proposed many promising memory mechanisms, they are scattered in different papers, and there lacks a systematical review to summarize and compare these works from a holistic perspective, failing to abstract common and effective designing patterns for inspiring future studies. To bridge this gap, in this article, we propose a comprehensive survey on the memory mechanism of LLM-based agents. In specific, we first discuss “what is” and “why do we need” the memory in LLM-based agents. Then, we systematically review previous studies on how to design and evaluate the memory module. In addition, we also present many agent applications, where the memory module plays an important role. At last, we analyze the limitations of existing work and show important future directions. To keep up with the latest advances in this field, we create a repository at https://github.com/nuster1128/LLM_Agent_Memory_Survey.