PromptChainer: Chaining Large Language Model Prompts through Visual Programming
TL;DR Highlight
Research on a visual interface for building complex AI apps by connecting multiple LLM calls as node-edge graphs for easy debugging.
Who Should Read
Developers or designers building complex features by chaining multiple LLM calls. Especially people prototyping apps like chatbots or writing assistants that can't be solved with a single prompt.
Core Mechanics
- Implements 'Prompt Chaining' — breaking down complex tasks into chains of multiple LLM calls — as a visual interface
- Node types are granular: LLM nodes, classifier nodes, JS helper nodes, API call nodes, etc., combinable by purpose
- Format inconsistency in LLM outputs is the key problem for inter-step data flow — changing a prompt changes output format, causing cascading errors that break downstream nodes
- 3-level debugging support: test nodes standalone → run full chain → use breakpoints to edit intermediate outputs and re-run only downstream steps
- Discovered 2 patterns users use to structure chains: parallel branching logic (decision tree style) vs iterative processing that progressively refines content
- Chaining isn't only used to work around LLM limitations — people also intentionally split steps for reusability and extensibility
Evidence
- 4 participants completed chains averaging 5.5 ± 0.9 nodes within study sessions
- Analysis of 27 LLM nodes: 7 input classification, 13 information generation, 7 restructuring — diverse use cases
- Pre-defined helper nodes used 3x more than custom JS nodes (13 vs 4) — built-in nodes cover most cases
- Prototypes verified with LaMDA 137B parameter model (similar scale to GPT-3) as backend
How to Apply
- When a single prompt gets too complex: split the task into 'classify → process → format convert' stages, test each independently, then connect — debugging becomes much easier
- When cascading errors occur in LangChain or custom pipelines: log intermediate node outputs and design a breakpoint structure that lets you re-run only downstream from the problem node
- Early in chain design, validate the overall structure first with zero-shot or 1-2 example rough draft prompts, then do fine-tuning later — more efficient strategy
Code Example
# Example of step-by-step debugging pattern for LLM chains (Python pseudocode)
def run_chain_with_debug(input_text, breakpoint_node=None):
results = {}
# Step 1: Input classification
classification = llm_call(
prompt=f"""[Dialog] Play some music I like. [Class] is_music
[Dialog] Hey what's up. [Class] not_music
[Dialog] {input_text} [Class]"""
)
results['classify'] = classification
print(f"[Node 1] Classification: {classification}")
# Breakpoint: intermediate output can be manually modified
if breakpoint_node == 'classify':
classification = input(f"Current output: {classification}\nEnter value to modify (Enter=keep): ") or classification
# Step 2: Branch handling
if classification.strip() == 'is_music':
# Music-related processing
entities = llm_call(
prompt=f"Extract music entities from: {input_text}\nEntities:"
)
results['entities'] = entities
print(f"[Node 2] Entities: {entities}")
# Step 3: Helper node - parsing
entity_list = [e.strip() for e in entities.split(',')]
# Step 4: Final response generation
response = llm_call(
prompt=f"Generate a friendly music recommendation response for: {entity_list[0]}"
)
else:
response = llm_call(
prompt=f"Generate a casual non-music response to: {input_text}"
)
results['final'] = response
return results
# Usage example
result = run_chain_with_debug("i love the song by Sting", breakpoint_node='classify')Terminology
Related Resources
Original Abstract (Expand)
While LLMs have made it possible to rapidly prototype new ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found that chaining multiple LLM runs together (with the output of one step being the input to the next) can help users accomplish these more complex tasks, and in a way that is perceived to be more transparent and controllable. However, it remains unknown what users need when authoring their own LLM chains – a key step to lowering the barriers for non-AI-experts to prototype AI-infused applications. In this work, we explore the LLM chain authoring process. We find from pilot studies that users need support transforming data between steps of a chain, as well as debugging the chain at multiple granularities. To address these needs, we designed PromptChainer, an interactive interface for visually programming chains. Through case studies with four designers and developers, we show that PromptChainer supports building prototypes for a range of applications, and conclude with open questions on scaling chains to even more complex tasks, as well as supporting low-fi chain prototyping.