LLM 기반 에이전트의 Lifelong Learning: 로드맵

Lifelong Learning of Large Language Model based Agents: A Roadmap

Jan 13, 2025•Junhao Zheng, Chengming Shi, Xidi Cai +5•View PDF

TL;DR Highlight

LLM 에이전트가 새 환경에서도 과거 지식을 잊지 않고 계속 학습하게 만드는 방법론을 총망라한 첫 번째 서베이.

Who Should Read

LLM 에이전트를 실서비스에 배포하면서 '모델이 새 데이터를 학습하면 기존 기능이 망가지는' 문제를 겪고 있는 AI/ML 엔지니어. 또는 장기 운영 가능한 에이전트 시스템 아키텍처를 설계 중인 백엔드 개발자.

Core Mechanics

LLM 에이전트를 Perception(지각) → Memory(기억) → Action(행동) 3개 모듈로 분류하고, 각각에 적용 가능한 lifelong learning 기법을 체계적으로 정리함
Catastrophic Forgetting(새 것을 배우면 옛 것을 잊는 현상)과 Loss of Plasticity(새 것을 못 배우는 현상) 두 가지가 핵심 난제이며, 이 둘의 균형이 AGI로 가는 관문
Memory 모듈은 Working(단기), Episodic(경험), Semantic(외부 지식), Parametric(모델 파라미터) 4가지로 세분화되고 각각 다른 학습 전략이 필요함
RAG 시스템에서 지식 업데이트는 Document-level(전체 문서 재벡터화)과 Chunk-level(변경된 청크만 fingerprint 비교 후 업데이트) 두 전략으로 나뉨 - LangChain, LlamaIndex 모두 후자 지원
Continual Knowledge Editing 방법론으로 WISE(이중 메모리), GRACE(코드북 어댑터), ELDER(Mixture-of-LoRA) 등이 모델 전체 재학습 없이 지식을 수술적으로 업데이트하는 방식을 제공
GPT-4 같은 강력한 모델도 배포 후 정적(static) 상태라 새 환경에 적응 불가 - 2023년부터 LLM 에이전트의 lifelong learning이 독립 연구 분야로 급성장 중

Evidence

Google Scholar 기준 lifelong learning 및 LLM Agent 관련 논문 수가 최근 3년간 급격히 증가 (정확한 수치는 Figure 1 참조)
기존 서베이 14편을 비교한 결과, LLMs + Lifelong Learning + NLP + Agents 네 가지를 모두 다룬 서베이는 본 논문이 최초 (Table 1)
멀티모달 지각 관련 논문 20편 이상(Perceiver, VATT, Omnivore, PathWeave 등)을 분류·정리하여 modality-complete/incomplete 학습 시나리오별 대응 전략 제시 (Table 2)
Continual Alignment 기법으로 DPO, IPO, EPO, KTO, DMPO, ORPO, ROPO, CDPO 등 8개 손실 함수를 수식과 함께 체계적으로 비교 정리 (Table 4)

How to Apply

RAG 파이프라인에서 문서가 자주 바뀌는 경우, LangChain/LlamaIndex의 chunk-level fingerprint 업데이트 전략을 써서 변경된 청크만 재임베딩하면 비용과 지식 최신성을 동시에 잡을 수 있음
에이전트가 새 도구나 도메인을 추가할 때 모델 전체를 파인튜닝하는 대신, ELDER(Mixture-of-LoRA) 방식처럼 태스크별 LoRA 어댑터를 동적으로 라우팅하면 기존 기능 손상 없이 확장 가능
장기 대화 에이전트를 만들 때 MemoChat처럼 동적 메모리 뱅크를 구성해 과거 대화를 episodic memory로 저장하고 검색하면, 컨텍스트 창 한계를 넘는 대화 일관성을 유지할 수 있음

Code Example

snippet

# RAG chunk-level incremental update 예시 (LlamaIndex 기반)
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.storage.docstore import SimpleDocumentStore

# 문서 지문(fingerprint) 기반 증분 업데이트
# 변경된 청크만 재인덱싱 - 전체 재벡터화 불필요

docstore = SimpleDocumentStore()

# 신규/변경 문서만 로드
new_docs = SimpleDirectoryReader('./updated_docs').load_data()

# 기존 인덱스에 증분 추가 (삭제된 청크는 자동 제거)
index = VectorStoreIndex.from_documents(
    new_docs,
    docstore=docstore,
    # show_progress=True
)

# Continual Knowledge Editing 개념적 예시 (WISE 듀얼 메모리 방식)
class DualMemoryAgent:
    def __init__(self, base_model, side_memory={}):
        self.main_memory = base_model   # 사전학습 지식 (불변)
        self.side_memory = side_memory  # 편집된 새 지식
    
    def query(self, question):
        # 라우터: 새 지식 관련이면 side_memory 우선
        if self._is_edited_knowledge(question):
            return self.side_memory.get(question)
        return self.main_memory.generate(question)
    
    def edit_knowledge(self, key, new_value):
        # 메인 모델 건드리지 않고 side_memory만 업데이트
        self.side_memory[key] = new_value

Terminology

Catastrophic Forgetting새로운 것을 학습하면 이전에 배운 것을 통째로 잊어버리는 현상. 새 직원이 새 업무를 배우다가 기존 업무 방법을 까먹는 것과 비슷.

Lifelong Learning한 번 학습하고 끝나는 게 아니라 평생 동안 계속 새 지식을 쌓으면서 기존 지식도 유지하는 학습 방식. 사람처럼 매일 새로운 경험을 통해 성장하는 것.

POMDP에이전트가 환경 전체를 볼 수 없고 일부만 관찰하면서 행동해야 하는 의사결정 수학 모델. 안개 낀 전장에서 지도 없이 목적지를 찾는 상황과 비슷.

Episodic Memory에이전트가 과거에 겪은 구체적 경험(어떤 상황에서 어떤 행동을 했고 결과가 어땠는지)을 저장하는 메모리. 사람의 일기장 같은 개념.

Parametric Memory모델의 파라미터(가중치) 안에 암묵적으로 저장된 지식. 직접 꺼내볼 수 없지만 추론할 때 자동으로 활용됨. 수영을 배운 후 몸이 기억하는 근육 기억과 비슷.

Knowledge Distillation크고 성능 좋은 교사 모델의 출력을 작은 학생 모델이 따라 배우게 하는 기법. 선생님이 답을 알려주면서 사고 과정도 전수하는 것.

LoRA모델 전체를 다시 학습하지 않고, 작은 행렬 두 개만 추가해서 특정 태스크에 맞게 미세조정하는 기법. 스마트폰에 앱만 설치해서 새 기능을 추가하는 것과 비슷.

Stability-Plasticity Dilemma기존 지식을 잘 유지(stability)하면 새 지식을 배우기 어렵고, 새 지식을 잘 배우면(plasticity) 기존 지식을 잊는 트레이드오프. 노인은 경험이 많지만 새 것을 배우기 어렵고, 아이는 빨리 배우지만 금방 잊는 것과 유사.

Related Resources

awesome-lifelong-llm-agent GitHub

Original Abstract (Expand)

Lifelong learning, also known as continual or incremental learning, is a crucial component for advancing Artificial General Intelligence (AGI) by enabling systems to continuously adapt in dynamic environments. While large language models (LLMs) have demonstrated impressive capabilities in natural language processing, existing LLM agents are typically designed for static systems and lack the ability to adapt over time in response to new challenges. This survey is the first to systematically summarize the potential techniques for incorporating lifelong learning into LLM-based agents. We categorize the core components of these agents into three modules: the perception module for multimodal input integration, the memory module for storing and retrieving evolving knowledge, and the action module for grounded interactions with the dynamic environment. We highlight how these pillars collectively enable continuous adaptation, mitigate catastrophic forgetting, and improve long-term performance. This survey provides a roadmap for researchers and practitioners working to develop lifelong learning capabilities in LLM agents, offering insights into emerging trends, evaluation metrics, and application scenarios. Relevant literature and resources are available at at https://github.com/qianlimalab/ awesome-lifelong-llm-agent.