Sovereign Execution Broker: AI 에이전트의 클라우드 인프라 변경을 Certificate로 강제 통제하는 런타임 경계

TL;DR Highlight

LLM 에이전트가 AWS/K8s를 직접 건드리지 못하게, 모든 인프라 변경을 암호화 인증서로 묶어 중간 브로커가 강제 검증하는 보안 아키텍처.

Who Should Read

LLM 에이전트에게 클라우드 인프라 제어권을 주면서도 보안이 걱정되는 DevOps/플랫폼 엔지니어. AWS IAM이나 Kubernetes RBAC만으로는 에이전트의 오작동·프롬프트 인젝션을 막기 어렵다고 느끼는 팀.

Core Mechanics

기존 IAM은 '누가' 요청하는지만 보지만, SEB는 '이 특정 작업이 지금 이 순간에도 여전히 안전한가'를 7단계 파이프라인으로 검증함.
에이전트는 영구 자격증명(standing credentials)을 아예 갖지 못하고, SEB만이 AWS STS나 K8s TokenRequest로 단명 토큰을 발급해 실행하는 구조.
SAB(Sovereign Assurance Boundary)가 발급한 암호화 인증서(Ω)에 작업 내용·대상 리소스·유효 시간·revocation epoch이 묶여 있어, 인증서 발급 후 상태가 바뀌면 실행 자체를 거부함.
TOCTOU(Time-of-Check to Time-of-Use) 문제를 막기 위해 실행 직전에 live-state drift 체크를 수행 — 인증 시점과 실행 시점 사이에 서브넷이 삭제되거나 보안 정책이 바뀌면 자동 거부.
글로벌 revocation epoch 카운터를 두고, 보안 사고 발생 시 epoch를 올리면 이미 발급된 인증서들을 최대 5.2초 내에 전량 차단함.
nonce 예약을 PostgreSQL atomic INSERT로 구현해 동일 인증서의 replay 공격(이미 실행한 작업 재실행)을 원천 차단하고, 모든 결정·결과를 Ed25519 서명된 레코드로 기록함.

Evidence

주입된 9가지 공격 시나리오(미인증 변경, replay, 정책 epoch 불일치, revocation partition 등) 각 1,000건 테스트에서 거부율 100%, unsafe escape rate 0.0%.
K8s 워크로드 broker-only 오버헤드 p50 28.2ms, p99 47.1ms / AWS 워크로드 p50 136.9ms, p99 284.1ms — 오버헤드의 62%는 AWS drift 체크(DescribeSecurityGroups) 때문.
revocation epoch 전파: 폴링 5초 + 캐시 TTL 5초 기준 최대 5.2초(평균 2.6초) 내에 모든 브로커가 해당 인증서를 거부하기 시작.
K8s TokenRequest 어댑터 최대 처리량 820 req/s(80 threads), AWS STS 어댑터는 API 스로틀링으로 40 threads에서 240 req/s로 제한됨.

How to Apply

에이전트가 보안 그룹 변경·auto-scaling·DB 접근 등 고위험 작업을 수행해야 할 때, 에이전트 서비스 계정의 IAM 권한에서 mutation API를 완전히 제거하고 SEB 브로커 역할만 해당 API를 호출할 수 있도록 AWS SCP(Service Control Policy)를 설정하면 된다.
Kubernetes 환경에서는 에이전트 ServiceAccount에서 create/update/patch/delete verb를 제거하고, Validating Admission Webhook을 설치해 `seb.openkedge.io/cid`와 `seb.openkedge.io/nonce` 어노테이션이 없는 protected namespace 변경을 전부 거부하도록 구성한다.
인증서 발급 후 실행까지 시간 차이가 있는 워크플로우(예: 야간 배치 작업 사전 승인)에서 drift 허용 임계값 ε_C를 0으로 설정하면 상태 변화 시 자동 재심사를 강제할 수 있고, 비교적 안정적인 리소스는 ε_C를 높여 불필요한 거부를 줄일 수 있다.

Code Example

snippet

-- PostgreSQL: SEB가 replay 공격을 막는 nonce 예약 테이블
CREATE TABLE reserved_nonces (
  cid   UUID        NOT NULL,
  nonce VARCHAR(64) NOT NULL,
  reserved_at TIMESTAMP NOT NULL,
  PRIMARY KEY (cid, nonce)  -- 중복 INSERT 시 즉시 실패 → replay 차단
);

-- Kubernetes Validating Webhook 핵심 로직 (pseudo-Go)
func (h *WebhookHandler) Handle(req AdmissionRequest) AdmissionResponse {
    cid   := req.Object.Annotations["seb.openkedge.io/cid"]
    nonce := req.Object.Annotations["seb.openkedge.io/nonce"]

    if cid == "" || nonce == "" {
        return Deny("missing SEB broker attestation")
    }
    if !h.ledger.IsReserved(cid, nonce) {
        return Deny("nonce not reserved by broker")
    }
    return Allow()
}

-- AWS SCP: 브로커 역할 외 모든 주체의 mutation 차단
{
  "Effect": "Deny",
  "Action": ["ec2:AuthorizeSecurityGroupIngress", "ec2:RevokeSecurityGroupIngress"],
  "Resource": "*",
  "Condition": {
    "StringNotLike": {
      "aws:PrincipalARN": "arn:aws:iam::*:role/seb-broker-role"
    }
  }
}

Terminology

IAM클라우드에서 '누가 무엇을 할 수 있는지' 관리하는 열쇠 관리 시스템. AWS의 경우 역할(Role)과 정책(Policy)으로 권한을 정의함.

STS (Security Token Service)AWS에서 단명 임시 자격증명을 발급해주는 서비스. 마치 당일짜리 출입증을 발급해주는 곳과 같음.

RBAC역할 기반 접근 제어. 사람/서비스에 역할을 부여하고 역할에 권한을 묶는 방식 — Kubernetes의 기본 권한 관리 방법.

TOCTOUTime-of-Check to Time-of-Use. '확인한 시점'과 '실제 사용 시점' 사이에 상황이 바뀌어 보안 구멍이 생기는 문제. 예: 문이 잠겼는지 확인 후 들어가기 직전에 누가 잠금을 해제하는 상황.

nonce한 번만 쓸 수 있는 일회용 번호. 동일한 인증서를 두 번 사용하는 replay 공격을 막기 위해 사용.

revocation epoch전 시스템에서 '지금 유효한 보안 정책 세대 번호'. 숫자가 올라가면 이전 세대 인증서가 전부 무효화됨 — 건물 마스터키를 바꾸면 이전 키가 전부 안 열리는 것과 같음.

Ed25519빠르고 안전한 디지털 서명 알고리즘. 문서에 도장을 찍듯이 데이터가 위변조되지 않았음을 증명하는 데 사용.

drift인증서를 발급할 때의 인프라 상태와 실제 실행 시점의 상태가 달라진 것. 예: 방화벽 규칙 변경 승인받은 서브넷이 그 사이에 삭제된 경우.

Related Resources

Original Abstract (Expand)

Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existing access-control mechanisms authorize identities, while assurance layers certify proposed actions; neither alone provides a mandatory enforcement point for certified authority at the moment of mutation. This paper introduces the Sovereign Execution Broker (SEB), a runtime enforcement boundary for certificate-bound agentic infrastructure. SEB consumes certificates issued by the Sovereign Assurance Boundary (SAB), verifies that the requested mutation matches the certified execution contract, checks validity windows, policy epochs, revocation epochs, and live-state drift, mints scoped execution identity, invokes infrastructure APIs, and records signed decision and outcome records. By separating proposal, admission, and execution, SEB turns certified authority into a short-lived, revocable, auditable runtime capability, provided that production mutation APIs reject non-broker identities. We present the SEB execution model, certificate and replay-verification predicates, scoped identity semantics, bypass-prevention deployment patterns, failure behavior, and a concrete prototype implementation. We evaluate the prototype on AWS and Kubernetes clusters, measuring latency overheads, revocation propagation, drift detection, and security under fault injection.