처음부터 다시 하지 마세요: LLM 간 개인화 Prompt 마이그레이션 프레임워크

Don't Start Over: A Cost-Effective Framework for Migrating Personalized Prompts Between LLMs

Jan 17, 2026•Ziyi Zhao, Chongming Gao, Yang Zhang +5•View PDF

TL;DR Highlight

LLM 업그레이드할 때 유저별 soft prompt를 98% 적은 비용으로 그대로 옮겨주는 경량 어댑터 프레임워크.

Who Should Read

LLM 기반 추천 시스템이나 개인화 서비스를 운영 중인 ML 엔지니어. 특히 모델 업그레이드 시 수만 명의 유저 프로파일을 어떻게 보존할지 고민하는 팀.

Core Mechanics

Soft prompt(유저 선호를 담은 경량 벡터)는 특정 LLM에 종속되어 있어, 모델을 바꾸면 모든 유저 데이터를 처음부터 재학습해야 하는 문제가 있음
PUMA는 작은 feed-forward 어댑터 하나로 구 모델의 soft prompt를 신 모델 공간으로 매핑해서 재학습 없이 이전 가능
전체 유저를 K-means로 클러스터링한 뒤, 각 클러스터 내에서 행동 분산(variance)으로 계층화 샘플링해 대표 유저 2,000명만으로 어댑터 학습
Llama-2-1B → Llama-2-3B뿐 아니라 LLaMA → Qwen, Phi, Gemma, StableLM 등 아키텍처가 전혀 다른 모델 간에도 잘 작동
여러 소스 모델의 prompt를 하나의 타겟 모델로 합치는 aggregated migration 시 단일 소스보다 성능이 더 좋아짐(지식 시너지)
A→B→C→D→E로 체인처럼 연속 마이그레이션해도 성능이 안정적으로 유지됨

Evidence

Amazon 데이터셋에서 PUMA RMSE 0.9135로 full retraining(0.9414)보다 낮고, MIND uAUC도 0.6552 vs 0.5289로 대폭 향상
학습 시간 기준 full retraining 대비 50x 빠름(Amazon: 24시간 → 0.48시간), 연산 비용 최대 98% 절감
2,000명 유저로 학습한 PUMA(RMSE 0.9315)가 랜덤 샘플링 6,000명 baseline(RMSE 0.9320)보다 1/3의 데이터로 더 나은 성능 달성
Llama+StableLM 통합 aggregated migration이 단일 소스(Llama 0.9293, StableLM 0.9380)보다 RMSE 0.9217로 더 우수

How to Apply

1+N 시스템(하나의 LLM + 수만 개의 유저별 soft prompt)을 운영 중이고 모델을 업그레이드해야 할 때, 전체 재학습 대신 PUMA 어댑터만 훈련해서 기존 프로파일을 이전
A/B 테스트나 멀티 모델 운영 후 하나로 통합할 때, 각 모델에서 학습된 유저 prompt를 concatenate해 타겟 모델에 aggregated migration 적용
신규 유저 cold-start 문제에도 활용 가능: 기존 모델에서 학습된 어댑터로 새 유저 prompt를 빠르게 초기화해 재학습 시간 단축

Code Example

snippet

# PUMA 어댑터 구조 (PyTorch 유사 코드)
import torch
import torch.nn as nn

class PUMAAdapter(nn.Module):
    """
    source_dim: 구 LLM embedding 차원 (e.g., 1B 모델)
    target_dim: 신 LLM embedding 차원 (e.g., 3B 모델)
    prompt_len: soft prompt 길이 (논문에서 l=1)
    """
    def __init__(self, source_dim: int, target_dim: int):
        super().__init__()
        self.adapter = nn.Sequential(
            nn.Linear(source_dim, target_dim * 2),
            nn.LayerNorm(target_dim * 2),
            nn.GELU(),
            nn.Linear(target_dim * 2, target_dim),
        )
        # 잔차 연결을 위한 프로젝션
        self.residual_proj = nn.Linear(source_dim, target_dim)

    def forward(self, source_prompt: torch.Tensor) -> torch.Tensor:
        # source_prompt: (batch, prompt_len, source_dim)
        return self.adapter(source_prompt) + self.residual_proj(source_prompt)

# 유저 선택 전략 (group-based)
from sklearn.cluster import KMeans
import numpy as np

def select_representative_users(
    prompt_embeddings: np.ndarray,  # (num_users, emb_dim)
    output_variance: np.ndarray,    # (num_users,)
    n_clusters: int = 50,
    budget: int = 2000,
) -> list[int]:
    # Stage 1: K-means로 선호 다양성 클러스터링
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    cluster_labels = kmeans.fit_predict(prompt_embeddings)

    selected_indices = []
    per_cluster_budget = budget // n_clusters

    for c in range(n_clusters):
        cluster_mask = cluster_labels == c
        cluster_idx = np.where(cluster_mask)[0]
        cluster_var = output_variance[cluster_idx]

        # Stage 2: 분산 기준 계층화 샘플링 (중간 분산 유저에 가중치)
        bins = np.percentile(cluster_var, [33, 66])
        low = cluster_idx[cluster_var <= bins[0]]
        mid = cluster_idx[(cluster_var > bins[0]) & (cluster_var <= bins[1])]
        high = cluster_idx[cluster_var > bins[1]]

        # 정규분포 가중치: 중간 그룹에 더 많이 할당
        weights = [1, 2, 1]  # low:mid:high
        total_w = sum(weights)
        for group, w in zip([low, mid, high], weights):
            n = max(1, int(per_cluster_budget * w / total_w))
            if len(group) > 0:
                chosen = np.random.choice(group, min(n, len(group)), replace=False)
                selected_indices.extend(chosen.tolist())

    return selected_indices[:budget]

Terminology

Soft Prompt모델 파라미터는 건드리지 않고, 입력 앞에 붙이는 학습 가능한 벡터. 마치 직원에게 매번 상세한 업무 지시서를 주는 대신, 그 직원 전용 '작업 카드'를 만들어 매번 꺼내 쓰는 것과 비슷.

Prompt Tuning전체 모델 가중치 대신 입력에 붙이는 soft prompt만 학습하는 방법. 옷 전체를 바꾸지 않고 액세서리만 바꾸는 것처럼 가볍게 모델을 커스터마이징.

PEFTParameter-Efficient Fine-Tuning의 약자. 수십억 개 파라미터 전체가 아닌 일부만 학습해서 비용을 줄이는 파인튜닝 기법들의 총칭. LoRA, Prompt Tuning 등이 여기 포함됨.

Coreset Selection전체 학습 데이터 중 가장 대표성 있는 소수의 샘플을 고르는 기법. 수만 명 대신 대표 2,000명만 골라서 학습해도 비슷한 결과를 낼 수 있게 하는 데이터 선택 전략.

K-means Clustering데이터를 K개의 그룹으로 자동 분류하는 알고리즘. 비슷한 취향의 유저끼리 묶어주는 자동 그룹핑 도구.

RMSERoot Mean Square Error. 예측값과 실제값의 차이를 측정하는 지표. 숫자가 낮을수록 예측이 정확함.

uAUC유저별 AUC를 평균낸 지표. 클릭 예측 등 이진 분류에서 개인별 랭킹 성능을 측정. 1에 가까울수록 좋음.

Related Resources

Original Abstract (Expand)

Personalization in Large Language Models (LLMs) often relies on user-specific soft prompts. However, these prompts become obsolete when the foundation model is upgraded, necessitating costly, full-scale retraining. To overcome this limitation, we propose the Prompt-level User Migration Adapter (PUMA), a lightweight framework to efficiently migrate personalized prompts across incompatible models. PUMA utilizes a parameter-efficient adapter to bridge the semantic gap, combined with a group-based user selection strategy to significantly reduce training costs. Experiments on three large-scale datasets show our method matches or even surpasses the performance of retraining from scratch, reducing computational cost by up to 98%. The framework demonstrates strong generalization across diverse model architectures and robustness in advanced scenarios like chained and aggregated migrations, offering a practical path for the sustainable evolution of personalized AI by decoupling user assets from the underlying models.