PromptChainer: Visual Programming으로 LLM 프롬프트 체이닝하기

PromptChainer: Chaining Large Language Model Prompts through Visual Programming

Mar 13, 2022•Tongshuang Sherry Wu, Ellen Jiang, Aaron Donsbach +4•View PDF

TL;DR Highlight

복잡한 AI 앱을 만들 때 여러 LLM 호출을 노드-엣지 그래프로 시각적으로 연결하고 디버깅할 수 있는 인터페이스 연구.

Who Should Read

LLM을 여러 단계로 연결해서 복잡한 기능을 만들려는 개발자나 디자이너. 특히 챗봇, 글쓰기 어시스턴트처럼 단일 프롬프트로 해결 안 되는 앱을 프로토타이핑하는 사람.

Core Mechanics

LLM 하나로 못 풀리는 복잡한 태스크를 여러 LLM 호출로 쪼개 체인으로 연결하는 'Prompt Chaining' 개념을 시각적 인터페이스로 구현
노드 타입이 세분화됨: LLM 노드, 분류기 노드, JS 헬퍼 노드, API 호출 노드 등 용도별로 구분해서 조합 가능
LLM 출력 포맷이 제멋대로라 단계 간 데이터 변환이 핵심 문제 - 프롬프트 바꾸면 출력 형식도 바뀌어서 다음 노드가 망가지는 cascading error 발생
디버깅 3단계 지원: 노드 단독 테스트 → 전체 체인 실행 → 브레이크포인트로 중간 출력 수정 후 downstream만 재실행
사용자들이 체인 구조를 짜는 방식 2가지 패턴 발견: 병렬 분기 로직(의사결정 트리 스타일) vs 콘텐츠를 단계적으로 다듬는 반복 처리
LLM 한계 극복용으로만 체이닝 쓰는 게 아니라, 재사용성과 확장성을 위해 의도적으로 단계를 쪼개는 경우도 많음

Evidence

4명의 참여자가 평균 5.5 ± 0.9개 노드로 구성된 체인을 스터디 세션 내에 완성함
27개 LLM 노드 분석 결과: 입력 분류 7개, 정보 생성 13개, 재구성 7개로 다양하게 활용됨
미리 정의된 헬퍼 노드가 커스텀 JS 노드보다 3배 더 많이 사용됨 (13개 vs 4개) - 기본 제공 노드만으로도 대부분 커버 가능
LaMDA 137B 파라미터 모델(GPT-3와 유사 규모)을 백엔드로 사용해 실제 동작하는 체인 프로토타입 검증

How to Apply

단일 프롬프트가 너무 복잡해질 때: 태스크를 '분류 → 처리 → 포맷 변환' 단계로 쪼개고, 각 단계를 독립적으로 테스트한 뒤 연결하면 디버깅이 훨씬 쉬워짐
LangChain이나 직접 만든 파이프라인에서 cascading error가 발생할 때: 중간 노드 출력을 로깅하고, 문제 노드부터 downstream만 재실행할 수 있는 브레이크포인트 구조를 설계하면 됨
체인 설계 초반에는 zero-shot이나 1~2개 예시만 넣은 rough draft 프롬프트로 전체 구조를 먼저 검증하고, 세부 튜닝은 나중에 하는 전략이 효율적

Code Example

snippet

# LLM 체인 단계별 디버깅 패턴 예시 (Python pseudocode)

def run_chain_with_debug(input_text, breakpoint_node=None):
    results = {}
    
    # Step 1: 입력 분류
    classification = llm_call(
        prompt=f"""[Dialog] Play some music I like. [Class] is_music
[Dialog] Hey what's up. [Class] not_music
[Dialog] {input_text} [Class]"""
    )
    results['classify'] = classification
    print(f"[Node 1] Classification: {classification}")
    
    # 브레이크포인트: 중간 출력 수동 수정 가능
    if breakpoint_node == 'classify':
        classification = input(f"현재 출력: {classification}\n수정할 값 입력 (엔터=유지): ") or classification
    
    # Step 2: 분기 처리
    if classification.strip() == 'is_music':
        # 음악 관련 처리
        entities = llm_call(
            prompt=f"Extract music entities from: {input_text}\nEntities:"
        )
        results['entities'] = entities
        print(f"[Node 2] Entities: {entities}")
        
        # Step 3: 헬퍼 노드 - 파싱
        entity_list = [e.strip() for e in entities.split(',')]
        
        # Step 4: 최종 응답 생성
        response = llm_call(
            prompt=f"Generate a friendly music recommendation response for: {entity_list[0]}"
        )
    else:
        response = llm_call(
            prompt=f"Generate a casual non-music response to: {input_text}"
        )
    
    results['final'] = response
    return results

# 사용 예시
result = run_chain_with_debug("i love the song by Sting", breakpoint_node='classify')

Terminology

Prompt ChainingLLM 호출 하나로 못 풀 때 여러 번 나눠서 호출하고, 앞 결과를 다음 입력으로 넘기는 방식. 공장 컨베이어 벨트처럼 각 단계가 순서대로 처리하는 것과 비슷.

Few-shot promptLLM에게 '이런 식으로 해줘'라고 예시 몇 개를 보여주는 방식. 시험 전에 예제 문제 풀이를 보여주는 것과 같음.

Zero-shot prompt예시 없이 설명만으로 LLM에게 태스크를 시키는 방식. 처음 보는 문제를 설명만 읽고 푸는 것.

Cascading error파이프라인 앞 단계의 오류가 뒤 단계까지 연쇄적으로 망가뜨리는 현상. 도미노처럼 하나가 쓰러지면 다 쓰러지는 것.

Function signature함수가 받는 입력과 출력의 타입/형식 정의. LLM은 프롬프트가 바뀌면 출력 형식도 바뀌어서 이게 불안정한 게 문제.

In-context learningLLM을 재학습 없이 프롬프트만으로 새 태스크에 적응시키는 것. 모델 자체는 안 바꾸고 질문 방식만 바꿔서 원하는 동작을 끌어내는 기법.

LaMDAGoogle이 만든 대화 특화 언어모델(137B 파라미터). 이 논문에서 GPT-3 대신 백엔드로 사용된 모델.

Related Resources

Original Abstract (Expand)

While LLMs have made it possible to rapidly prototype new ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found that chaining multiple LLM runs together (with the output of one step being the input to the next) can help users accomplish these more complex tasks, and in a way that is perceived to be more transparent and controllable. However, it remains unknown what users need when authoring their own LLM chains – a key step to lowering the barriers for non-AI-experts to prototype AI-infused applications. In this work, we explore the LLM chain authoring process. We find from pilot studies that users need support transforming data between steps of a chain, as well as debugging the chain at multiple granularities. To address these needs, we designed PromptChainer, an interactive interface for visually programming chains. Through case studies with four designers and developers, we show that PromptChainer supports building prototypes for a range of applications, and conclude with open questions on scaling chains to even more complex tasks, as well as supporting low-fi chain prototyping.