A comprehensive taxonomy of hallucinations in Large Language Models
TL;DR Highlight
A comprehensive report covering why LLMs hallucinate, what types exist, and how to prevent them — from mathematical proofs to practical defenses.
Who Should Read
Backend/AI developers shipping LLM-based services to production who face hallucination issues. Engineers building reliable AI systems in high-stakes domains like healthcare, law, and finance where wrong answers are costly.
Core Mechanics
- Hallucination is mathematically unavoidable — computational theory proves that for any LLM architecture and training method, inputs that cause errors must exist
- Hallucination isn't a single phenomenon — it's categorized into Intrinsic (contradicts input), Extrinsic (contradicts reality), Factuality (factual errors), Faithfulness (deviates from instructions), etc. with at least 7 distinct types
- GPT-4's wrong answers were rated more convincing than human expert correct answers in blind evaluations
- Domain-specific hallucination benchmarks exist: CodeHaluEval for code, MedHallu for medical, Med-HALT for clinical
Evidence
- Logical inconsistency accounts for 19% of hallucination cases, temporal disorientation 12%, ethical violations 6%
- GPT-4's incorrect answers rated more persuasive than human expert correct answers in blind evaluation (Bubeck et al., 2023)
- MedHallu and Med-HALT provide domain-specific hallucination evaluation frameworks for medical AI
How to Apply
- First identify hallucination types relevant to your domain — use CodeHaluEval criteria (input-conflicting, context-conflicting, fact-conflicting) for code generation; use MedHallu or Med-HALT for healthcare/legal domains.
- Add RAG with retrieval verification: after retrieving documents, have a separate LLM call verify whether the retrieved content actually supports the generated answer.
- Implement confidence scoring: if the model's token-level probabilities are low for key claims, flag those for human review rather than serving them directly.
Code Example
# Simple example of RAG + fact validation layer
import anthropic
client = anthropic.Anthropic()
def rag_with_validation(user_query: str, retrieved_docs: list[str]) -> dict:
context = "\n\n".join(retrieved_docs)
# Step 1: Generate answer based on RAG
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": f"""Answer the question by referring to the following documents.
If the content is not found in the documents, mark it as 'No basis in documents'.
[Reference Documents]
{context}
[Question]
{user_query}"""
}]
)
answer = response.content[0].text
# Step 2: Self-evaluate the likelihood of hallucination in the answer
validation = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
messages=[{
"role": "user",
"content": f"""Verify whether the answer below is supported by the provided documents.
Rate the confidence level as high/medium/low and explain the reason in one sentence.
[Documents]
{context}
[Answer]
{answer}
Output format: {{"confidence": "high|medium|low", "reason": "..."}}"""
}]
)
return {
"answer": answer,
"validation": validation.content[0].text,
"sources": retrieved_docs
}
# Usage example
result = rag_with_validation(
user_query="What are the side effects of this medication?",
retrieved_docs=["Clinical trial results document...", "FDA approval materials..."]
)
print(result)Terminology
Related Resources
Original Abstract (Expand)
Large language models (LLMs) have revolutionized natural language processing, yet their propensity for hallucination, generating plausible but factually incorrect or fabricated content, remains a critical challenge. This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespective of architecture or training. It explores core distinctions, differentiating between intrinsic (contradicting input context) and extrinsic (inconsistent with training data or reality), as well as factuality (absolute correctness) and faithfulness (adherence to input). The report then details specific manifestations, including factual errors, contextual and logical inconsistencies, temporal disorientation, ethical violations, and task-specific hallucinations across domains like code generation and multimodal applications. It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences. Furthermore, the report examines cognitive and human factors influencing hallucination perception, surveys evaluation benchmarks and metrics for detection, and outlines architectural and systemic mitigation strategies. Finally, it introduces web-based resources for monitoring LLM releases and performance. This report underscores the complex, multifaceted nature of LLM hallucinations and emphasizes that, given their theoretical inevitability, future efforts must focus on robust detection, mitigation, and continuous human oversight for responsible and reliable deployment in critical applications.