Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior
TL;DR Highlight
A framework for distinguishing whether an LLM is lying due to the prompt or due to the model itself.
Who Should Read
Researchers and practitioners working on LLM reliability and honesty, especially those debugging unexpected model outputs.
Core Mechanics
- Proposes a taxonomy separating prompt-induced hallucinations from model-intrinsic ones
- Prompt-induced issues: misleading context, adversarial instructions, ambiguous phrasing trigger incorrect outputs even from capable models
- Model-intrinsic issues: factual gaps, reasoning failures, and overconfidence persist regardless of prompt quality
- Introduces diagnostic techniques to attribute a given failure to either source
- Suggests targeted mitigations: prompt redesign for prompt-induced issues vs. fine-tuning / RLHF for model-intrinsic ones
Evidence
- Evaluated across multiple LLMs and task types to validate the prompt vs. model attribution framework
- Shows that many apparent model failures are actually prompt-induced and fixable without retraining
- Provides case studies illustrating each failure mode
How to Apply
- When an LLM gives a wrong or deceptive answer, run the diagnostic checklist: is the prompt ambiguous or adversarial? If yes, fix the prompt first.
- If the error persists across well-formed prompts, treat it as a model-intrinsic issue and consider fine-tuning or RLHF.
- Use this framework when building evaluation suites to avoid misattributing model failures.
Code Example
# Prompt Sensitivity (PS) quick measurement example
import openai
question = "What is the capital of South Korea?"
prompts = [
f"{question}",
f"Answer the following question based on facts only: {question}",
f"Think step by step before answering (Chain-of-Thought). Question: {question}",
f"You are a fact-checking expert. Answer the following question accurately: {question}"
]
responses = []
for prompt in prompts:
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0
)
responses.append(response.choices[0].message.content)
# If responses differ from each other, PS is high → hallucination can be reduced by improving prompts
print("=== Response Comparison by Prompt ===")
for i, (p, r) in enumerate(zip(prompts, responses)):
print(f"[Prompt {i+1}] {r[:100]}...\n")
unique_responses = set(responses)
print(f"Unique response count: {len(unique_responses)} / {len(responses)}")
print("PS high (prompt improvement needed)" if len(unique_responses) > 1 else "PS low (may be an intrinsic model issue)")Terminology
Original Abstract (Expand)
Hallucination in Large Language Models (LLMs) refers to outputs that appear fluent and coherent but are factually incorrect, logically inconsistent, or entirely fabricated. As LLMs are increasingly deployed in education, healthcare, law, and scientific research, understanding and mitigating hallucinations has become critical. In this work, we present a comprehensive survey and empirical analysis of hallucination attribution in LLMs. Introducing a novel framework to determine whether a given hallucination stems from not optimize prompting or the model's intrinsic behavior. We evaluate state-of-the-art LLMs—including GPT-4, LLaMA 2, DeepSeek, and others—under various controlled prompting conditions, using established benchmarks (TruthfulQA, HallucinationEval) to judge factuality. Our attribution framework defines metrics for Prompt Sensitivity (PS) and Model Variability (MV), which together quantify the contribution of prompts vs. model-internal factors to hallucinations. Through extensive experiments and comparative analyses, we identify distinct patterns in hallucination occurrence, severity, and mitigation across models. Notably, structured prompt strategies such as chain-of-thought (CoT) prompting significantly reduce hallucinations in prompt-sensitive scenarios, though intrinsic model limitations persist in some cases. These findings contribute to a deeper understanding of LLM reliability and provide insights for prompt engineers, model developers, and AI practitioners. We further propose best practices and future directions to reduce hallucinations in both prompt design and model development pipelines.