PromptChainer: Chaining Large Language Model Prompts through Visual Programming

Mar 13, 2022•Tongshuang Sherry Wu, Ellen Jiang, Aaron Donsbach +4•View PDF

TL;DR Highlight

Research on a visual interface for building complex AI apps by connecting multiple LLM calls as node-edge graphs for easy debugging.

Who Should Read

Developers or designers building complex features by chaining multiple LLM calls. Especially people prototyping apps like chatbots or writing assistants that can't be solved with a single prompt.

Core Mechanics

Implements 'Prompt Chaining' — breaking down complex tasks into chains of multiple LLM calls — as a visual interface
Node types are granular: LLM nodes, classifier nodes, JS helper nodes, API call nodes, etc., combinable by purpose
Format inconsistency in LLM outputs is the key problem for inter-step data flow — changing a prompt changes output format, causing cascading errors that break downstream nodes
3-level debugging support: test nodes standalone → run full chain → use breakpoints to edit intermediate outputs and re-run only downstream steps
Discovered 2 patterns users use to structure chains: parallel branching logic (decision tree style) vs iterative processing that progressively refines content
Chaining isn't only used to work around LLM limitations — people also intentionally split steps for reusability and extensibility

Evidence

4 participants completed chains averaging 5.5 ± 0.9 nodes within study sessions
Analysis of 27 LLM nodes: 7 input classification, 13 information generation, 7 restructuring — diverse use cases
Pre-defined helper nodes used 3x more than custom JS nodes (13 vs 4) — built-in nodes cover most cases
Prototypes verified with LaMDA 137B parameter model (similar scale to GPT-3) as backend

How to Apply

When a single prompt gets too complex: split the task into 'classify → process → format convert' stages, test each independently, then connect — debugging becomes much easier
When cascading errors occur in LangChain or custom pipelines: log intermediate node outputs and design a breakpoint structure that lets you re-run only downstream from the problem node
Early in chain design, validate the overall structure first with zero-shot or 1-2 example rough draft prompts, then do fine-tuning later — more efficient strategy

Code Example

snippet

# Example of step-by-step debugging pattern for LLM chains (Python pseudocode)

def run_chain_with_debug(input_text, breakpoint_node=None):
    results = {}
    
    # Step 1: Input classification
    classification = llm_call(
        prompt=f"""[Dialog] Play some music I like. [Class] is_music
[Dialog] Hey what's up. [Class] not_music
[Dialog] {input_text} [Class]"""
    )
    results['classify'] = classification
    print(f"[Node 1] Classification: {classification}")
    
    # Breakpoint: intermediate output can be manually modified
    if breakpoint_node == 'classify':
        classification = input(f"Current output: {classification}\nEnter value to modify (Enter=keep): ") or classification
    
    # Step 2: Branch handling
    if classification.strip() == 'is_music':
        # Music-related processing
        entities = llm_call(
            prompt=f"Extract music entities from: {input_text}\nEntities:"
        )
        results['entities'] = entities
        print(f"[Node 2] Entities: {entities}")
        
        # Step 3: Helper node - parsing
        entity_list = [e.strip() for e in entities.split(',')]
        
        # Step 4: Final response generation
        response = llm_call(
            prompt=f"Generate a friendly music recommendation response for: {entity_list[0]}"
        )
    else:
        response = llm_call(
            prompt=f"Generate a casual non-music response to: {input_text}"
        )
    
    results['final'] = response
    return results

# Usage example
result = run_chain_with_debug("i love the song by Sting", breakpoint_node='classify')

Terminology

Prompt ChainingWhen a single LLM call can't solve a problem, split it into multiple calls and pass the output of each as input to the next. Like a factory conveyor belt where each stage processes in order.

Few-shot promptShowing the LLM a few examples of 'do it like this.' Like showing practice problem solutions before an exam.

Zero-shot promptNo examples at all — just instructions. Like telling someone the rules of a game without showing examples.

Cascading ErrorWhen an error in one step causes downstream steps to fail in a chain. Like one factory line breaking down and halting the whole production line.

BreakpointA debugging technique to pause and inspect the state mid-execution. In prompt chains, it means stopping at a specific node to check and modify intermediate output.

Related Resources

Original Abstract (Expand)

While LLMs have made it possible to rapidly prototype new ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found that chaining multiple LLM runs together (with the output of one step being the input to the next) can help users accomplish these more complex tasks, and in a way that is perceived to be more transparent and controllable. However, it remains unknown what users need when authoring their own LLM chains – a key step to lowering the barriers for non-AI-experts to prototype AI-infused applications. In this work, we explore the LLM chain authoring process. We find from pilot studies that users need support transforming data between steps of a chain, as well as debugging the chain at multiple granularities. To address these needs, we designed PromptChainer, an interactive interface for visually programming chains. Through case studies with four designers and developers, we show that PromptChainer supports building prototypes for a range of applications, and conclude with open questions on scaling chains to even more complex tasks, as well as supporting low-fi chain prototyping.