HEARTBEAT를 조심하라! Claw 에이전트의 백그라운드 실행이 조용한 메모리 오염을 가능하게 한다

Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution

Mar 24, 2026•Yechao Zhang, Shiqian Zhao, Jie Zhang +5•View PDF

TL;DR Highlight

AI 에이전트가 백그라운드에서 소셜 피드를 읽는 것만으로도 가짜 정보가 장기 메모리에 저장되어 이후 사용자 행동에 영향을 준다.

Who Should Read

OpenClaw, MemGPT 같은 persistent agent 프레임워크를 프로덕션에 배포하는 개발자나 에이전트 보안을 담당하는 엔지니어. 에이전트가 이메일, 슬랙, 소셜 피드 등 외부 소스를 자동 모니터링하도록 설정한 경우 필독.

Core Mechanics

Claw 계열 에이전트(OpenClaw, CoPaw 등)의 heartbeat(주기적 백그라운드 실행)는 포그라운드 대화와 동일한 세션 컨텍스트를 공유해서, 백그라운드에서 읽은 외부 콘텐츠가 사용자 모르게 메모리에 들어옴
공격자는 prompt injection 없이 소셜 플랫폼에 그럴싸한 가짜 정보만 올려두면 됨 — 에이전트가 heartbeat로 자동 수집하면서 오염이 시작됨
소셜 신호 중 '다수 합의(consensus)'가 가장 강력한 오염 유발 요인 — 권위 있는 계정 없이 다중 계정으로 동의 댓글만 달아도 ASR(공격 성공률) 73% 이상
단기 메모리 오염이 장기 메모리로 승격되는 비율이 최대 91% — 사용자가 '오늘 내용 저장해줘'라고 하면 오염된 정보도 함께 MEMORY.md에 저장됨
장기 메모리에 저장된 오염은 세션 리셋 후에도 유지되어 새 세션에서 76%의 행동 영향을 미침 (web_search 없을 때)
벤치마크 대상 모델은 Claude Haiku 4.5이며, 실험 도메인은 소프트웨어 보안(CVE 취약 버전 추천), 금융 의사결정(DeFi 프로토콜 선택), 학술 레퍼런스(가짜 논문 인용) 3가지

Evidence

소셜 신호 최강 조건(권위+합의 모두 있을 때) ASR 82.2%, 합의 없으면 22.2%로 급락 — 합의 신호가 지배적 요인
S4(명시적 저장 요청) 조건에서 장기 메모리 저장률 평균 91.1%, 크로스 세션 행동 영향 ASR 평균 75.6% (web_search 없을 때)
현실적 희석 조건(20개 게시글 중 1개만 오염)에서도 Financial 도메인 저장률 60%, 크로스 세션 ASR 33.3% — 맥락 관리 메커니즘이 완전한 방어선이 아님
Skeptical 페르소나도 강한 소셜 신호 조건에서 ASR 16.7% 기록 — 어떤 페르소나도 완전히 면역되지 않음

How to Apply

지금 에이전트에 heartbeat로 외부 소스(이메일, 슬랙, RSS, 소셜 피드)를 모니터링하고 있다면, heartbeat 실행을 별도 격리 세션에서 돌리도록 설정 변경 검토 — Claw 기본값인 'session main' 공유 방식이 핵심 취약점
MEMORY.md에 쓰기 전 출처(provenance) 태깅 로직 추가 — 백그라운드 heartbeat에서 수집된 정보에 'source: heartbeat, url: ...' 메타데이터를 붙여서 장기 메모리 저장 시 검토 가능하게 할 것
민감한 다운스트림 작업(보안 패키지 추천, 금융 결정 등) 실행 전 메모리 출처 검증 단계 삽입 — 에이전트 프롬프트에 '이 정보의 원본 소스가 사용자가 명시적으로 요청한 것인지 확인하라' 조건 추가

Code Example

snippet

# Claw/OpenClaw 스타일 에이전트에서 heartbeat 격리 설정 예시
# HEARTBEAT.md 수정 또는 에이전트 config에서 별도 세션 지정

# 취약한 기본 설정 (shared session)
heartbeat_config = {
    "session": "main",  # ← 포그라운드와 동일 세션 공유 (위험)
    "interval": 300,
    "tasks": ["check_email", "monitor_social_feed"]
}

# 권장 설정: 격리된 세션 사용
heartbeat_config = {
    "session": "heartbeat_isolated",  # ← 별도 세션
    "interval": 300,
    "tasks": ["check_email", "monitor_social_feed"],
    "memory_write_policy": "user_confirmed_only",  # 사용자 확인 후만 저장
    "provenance_tagging": True  # 출처 태깅 필수
}

# 메모리 저장 시 출처 메타데이터 추가 예시 (MEMORY.md 포맷)
"""
## [2026-03-24] Heartbeat-acquired info (UNVERIFIED)
Source: social_platform / submolt: security-updates
Ingested via: heartbeat background execution
Content: ...
Verification status: NOT verified by user
"""

Terminology

heartbeat에이전트가 사용자 요청 없이도 주기적으로 깨어나 이메일, 소셜 피드 등을 자동 확인하는 메커니즘. 스마트폰이 백그라운드에서 알림을 체크하는 것과 비슷.

persistent agent대화가 끊겨도 기억과 상태를 유지하는 AI 에이전트. 매번 처음부터 시작하는 일반 챗봇과 달리, 이전 세션 내용을 기억하고 지속적으로 작동함.

short-term memory현재 세션에서만 유지되는 에이전트의 작업 컨텍스트. 브라우저 탭을 닫으면 사라지는 세션 쿠키 같은 개념.

long-term memory세션이 끝나도 파일(MEMORY.md 등)에 저장되어 다음 세션에서도 참조되는 영구 메모리. 사람이 일기에 적어두면 나중에도 기억하는 것과 비슷.

ASR (Attack Success Rate)공격이 성공한 비율. 가짜 정보를 심었을 때 에이전트가 그 정보를 믿고 행동한 케이스의 비율.

E→M→B pathwayExposure(노출) → Memory(메모리 저장) → Behavior(행동 영향)의 공격 경로. 가짜 정보를 보여주면 → 기억에 저장되고 → 나중 행동에 영향을 준다는 흐름.

prompt injection악의적인 명령을 일반 텍스트에 숨겨서 AI 모델이 그 지시를 따르도록 유도하는 공격. 이 논문은 이런 명시적 공격 없이도 오염이 가능하다는 걸 보여줌.

provenance정보의 출처와 경로. 어디서 온 정보인지 추적 가능한 메타데이터. 이게 없으면 백그라운드에서 수집한 가짜 정보가 에이전트 '자체 지식'처럼 위장됨.

Related Resources

Original Abstract (Expand)

We identify a critical security vulnerability in mainstream Claw personal AI agents: untrusted content encountered during heartbeat-driven background execution can silently pollute agent memory and subsequently influence user-facing behavior without the user's awareness. This vulnerability arises from an architectural design shared across the Claw ecosystem: heartbeat background execution runs in the same session as user-facing conversation, so content ingested from any external source monitored in the background (including email, message channels, news feeds, code repositories, and social platforms) can enter the same memory context used for foreground interaction, often with limited user visibility and without clear source provenance. We formalize this process as an Exposure (E) $\rightarrow$ Memory (M) $\rightarrow$ Behavior (B) pathway: misinformation encountered during heartbeat execution enters the agent's short-term session context, potentially gets written into long-term memory, and later shapes downstream user-facing behavior. We instantiate this pathway in an agent-native social setting using MissClaw, a controlled research replica of Moltbook. We find that (1) social credibility cues, especially perceived consensus, are the dominant driver of short-term behavioral influence, with misleading rates up to 61%; (2) routine memory-saving behavior can promote short-term pollution into durable long-term memory at rates up to 91%, with cross-session behavioral influence reaching 76%; (3) under naturalistic browsing with content dilution and context pruning, pollution still crosses session boundaries. Overall, prompt injection is not required: ordinary social misinformation is sufficient to silently shape agent memory and behavior under heartbeat-driven background execution.