A comprehensive taxonomy of hallucinations in Large Language Models

Aug 3, 2025•Manuel Cossio•View PDF

TL;DR Highlight

A comprehensive report covering why LLMs hallucinate, what types exist, and how to prevent them — from mathematical proofs to practical defenses.

Who Should Read

Backend/AI developers shipping LLM-based services to production who face hallucination issues. Engineers building reliable AI systems in high-stakes domains like healthcare, law, and finance where wrong answers are costly.

Core Mechanics

Hallucination is mathematically unavoidable — computational theory proves that for any LLM architecture and training method, inputs that cause errors must exist
Hallucination isn't a single phenomenon — it's categorized into Intrinsic (contradicts input), Extrinsic (contradicts reality), Factuality (factual errors), Faithfulness (deviates from instructions), etc. with at least 7 distinct types
GPT-4's wrong answers were rated more convincing than human expert correct answers in blind evaluations
Domain-specific hallucination benchmarks exist: CodeHaluEval for code, MedHallu for medical, Med-HALT for clinical

Evidence

Logical inconsistency accounts for 19% of hallucination cases, temporal disorientation 12%, ethical violations 6%
GPT-4's incorrect answers rated more persuasive than human expert correct answers in blind evaluation (Bubeck et al., 2023)
MedHallu and Med-HALT provide domain-specific hallucination evaluation frameworks for medical AI

How to Apply

First identify hallucination types relevant to your domain — use CodeHaluEval criteria (input-conflicting, context-conflicting, fact-conflicting) for code generation; use MedHallu or Med-HALT for healthcare/legal domains.
Add RAG with retrieval verification: after retrieving documents, have a separate LLM call verify whether the retrieved content actually supports the generated answer.
Implement confidence scoring: if the model's token-level probabilities are low for key claims, flag those for human review rather than serving them directly.

Code Example

snippet

# Simple example of RAG + fact validation layer
import anthropic

client = anthropic.Anthropic()

def rag_with_validation(user_query: str, retrieved_docs: list[str]) -> dict:
    context = "\n\n".join(retrieved_docs)
    
    # Step 1: Generate answer based on RAG
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""Answer the question by referring to the following documents.
If the content is not found in the documents, mark it as 'No basis in documents'.

[Reference Documents]
{context}

[Question]
{user_query}"""
        }]
    )
    answer = response.content[0].text
    
    # Step 2: Self-evaluate the likelihood of hallucination in the answer
    validation = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=[{
            "role": "user",
            "content": f"""Verify whether the answer below is supported by the provided documents.
Rate the confidence level as high/medium/low and explain the reason in one sentence.

[Documents]
{context}

[Answer]
{answer}

Output format: {{"confidence": "high|medium|low", "reason": "..."}}"""
        }]
    )
    
    return {
        "answer": answer,
        "validation": validation.content[0].text,
        "sources": retrieved_docs
    }

# Usage example
result = rag_with_validation(
    user_query="What are the side effects of this medication?",
    retrieved_docs=["Clinical trial results document...", "FDA approval materials..."]
)
print(result)

Terminology

HallucinationWhen an LLM confidently generates plausible but factually wrong content. Not 'intentional lying' — it's the statistical next-token prediction producing text that diverges from reality.

Intrinsic HallucinationWhen the model contradicts itself within its own output. Like claiming someone was 'born in 1980' and 'born in 1990' in the same summary.

Extrinsic HallucinationWhen the model states something that sounds right but can't be verified from the input or is factually wrong.

Related Resources

Original Abstract (Expand)

Large language models (LLMs) have revolutionized natural language processing, yet their propensity for hallucination, generating plausible but factually incorrect or fabricated content, remains a critical challenge. This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespective of architecture or training. It explores core distinctions, differentiating between intrinsic (contradicting input context) and extrinsic (inconsistent with training data or reality), as well as factuality (absolute correctness) and faithfulness (adherence to input). The report then details specific manifestations, including factual errors, contextual and logical inconsistencies, temporal disorientation, ethical violations, and task-specific hallucinations across domains like code generation and multimodal applications. It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences. Furthermore, the report examines cognitive and human factors influencing hallucination perception, surveys evaluation benchmarks and metrics for detection, and outlines architectural and systemic mitigation strategies. Finally, it introduces web-based resources for monitoring LLM releases and performance. This report underscores the complex, multifaceted nature of LLM hallucinations and emphasizes that, given their theoretical inevitability, future efforts must focus on robust detection, mitigation, and continuous human oversight for responsible and reliable deployment in critical applications.