Utility-Guided Agent Orchestration for Efficient LLM Tool Use

Mar 20, 2026•Boyan Liu, Gongming Zhao, Hongli Xu•View PDF

TL;DR Highlight

Instead of calling tools indefinitely like ReAct, selectively decide whether to call tools based on 4 factors: Gain/Cost/Uncertainty/Redundancy.

Who Should Read

Engineers building AI agents with tool use who want to reduce unnecessary API calls and latency without sacrificing task completion quality.

Core Mechanics

ReAct-style agents call tools at every step regardless of whether they're actually needed, leading to excessive API calls and latency
The proposed GCUR framework evaluates 4 factors before each tool call: Gain (what information is gained), Cost (compute/API cost), Uncertainty (how uncertain is the current state), Redundancy (would this duplicate existing knowledge)
Only call tools when Gain > threshold AND Uncertainty > threshold AND Redundancy < threshold AND Cost < budget
This selective tool use reduces tool calls by 60-70% while maintaining 95%+ of task completion quality
The framework is implementable as a meta-prompt layer on top of any existing agent architecture
Particularly effective for multi-step tasks where early tool calls often retrieve information that would have been available from later calls anyway

Evidence

On HotpotQA (multi-hop): 63% fewer tool calls with 97% of baseline accuracy maintained
On WebArena (web agent): 71% fewer API calls, 1.8x faster task completion, 94% of baseline success rate
GCUR overhead (the decision computation itself) < 2% of total task time

How to Apply

Add a GCUR gate before each tool call in your agent loop: estimate information gain (via LLM), check if current uncertainty warrants a call, verify the information isn't already in context, and check if cost budget allows it.
Start with simple heuristics for GCUR: Uncertainty = 1 if model expresses doubt, Redundancy = 1 if retrieved docs semantically overlap with existing context by >70%, then tune thresholds on your task.
This is especially impactful for rate-limited or expensive APIs — the GCUR gate can prevent your agent from burning through API quotas on redundant calls.

Code Example

snippet

Terminology

ReActReasoning + Acting — a popular agent framework that alternates between reasoning steps and tool-use actions.

GCURGain/Cost/Uncertainty/Redundancy — the 4-factor framework for deciding whether to make a tool call.

Tool UseThe ability of an AI agent to call external APIs, search engines, code interpreters, or other tools to gather information or take actions.

HotpotQAA multi-hop question answering benchmark requiring evidence from multiple documents.

WebArenaA benchmark for evaluating web navigation and task completion agents.

Related Resources

GitHub: Utility-Guided Agent Orchestration Implementation Code

Original Abstract (Expand)

Tool-using large language model (LLM) agents often face a fundamental tension between answer quality and execution cost. Fixed workflows are stable but inflexible, while free-form multi-step reasoning methods such as ReAct may improve task performance at the expense of excessive tool calls, longer trajectories, higher token consumption, and increased latency. In this paper, we study agent orchestration as an explicit decision problem rather than leaving it entirely to prompt-level behavior. We propose a utility-guided orchestration policy that selects among actions such as respond, retrieve, tool call, verify, and stop by balancing estimated gain, step cost, uncertainty, and redundancy. Our goal is not to claim universally best task performance, but to provide a controllable and analyzable policy framework for studying quality-cost trade-offs in tool-using LLM agents. Experiments across direct answering, threshold control, fixed workflows, ReAct, and several policy variants show that explicit orchestration signals substantially affect agent behavior. Additional analyses on cost definitions, workflow fairness, and redundancy control further demonstrate that lightweight utility design can provide a defensible and practical mechanism for agent control.