Think-on-Graph: Knowledge Graph 위에서 LLM이 단계적으로 추론하는 방법

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph

Jul 15, 2023•Jiashuo Sun, Chengjin Xu, Lumingyuan Tang +6•View PDF

TL;DR Highlight

LLM이 Knowledge Graph를 직접 탐색하며 beam search로 추론 경로를 찾는 ToG 프레임워크 — 추가 학습 없이 9개 중 6개 데이터셋에서 SOTA 달성.

Who Should Read

LLM 답변의 hallucination(환각) 문제를 줄이고 싶은 AI 백엔드 개발자. 특히 Knowledge Graph나 구조화된 외부 지식을 LLM 추론에 연결하려는 개발자.

Core Mechanics

LLM을 단순 번역기로 쓰는 기존 'LLM ⊕ KG' 방식 대신, LLM이 KG 위에서 직접 beam search를 돌며 추론 경로를 탐색하는 'LLM ⊗ KG' 패러다임을 제안
매 단계마다 관계(relation) 탐색 → 가지치기(prune) → 엔티티 탐색 → 가지치기 순서로 반복하며, LLM이 '충분한 정보가 모였나?'를 스스로 판단해 조기 종료
KG에 없는 정보는 LLM 내재 지식으로 보완 가능 — KG가 불완전해도 답을 찾을 수 있음 (예: 'majority party' 관계가 없어도 'prime minister → Anthony Albanese → Labor Party'로 우회)
추론 경로가 명시적으로 남아서 오답 원인 추적(knowledge traceability)과 KG 자동 수정(knowledge correctability)이 가능
Llama-2-70B + ToG 조합이 GPT-4 단독 CoT보다 성능이 높은 경우가 있어, 비싼 대형 모델 없이도 운영 가능
plug-and-play 구조라 ChatGPT, GPT-4, Llama-2 등 다양한 LLM과 Freebase, Wikidata 등 다양한 KG에 학습 없이 적용 가능

Evidence

9개 데이터셋 중 6개에서 SOTA 달성 (WebQSP 82.6%, GrailQA 81.4%, Zero-Shot RE 88.3%, Creak 95.6% 등) — 이전 SOTA 대부분은 fine-tuning 모델
CoT 대비 GrailQA +51.8%, Zero-Shot RE +42.9% 성능 향상 (ChatGPT 기준)
Llama-2-70B + ToG가 CWQ에서 53.6% → GPT-4 단독 CoT 46.0%를 초과, 더 저렴한 모델로 더 비싼 모델을 이김
GPT-4 + ToG의 CWQ 성능 69.5% vs fine-tuning SOTA 70.4%로 0.9%p 차이까지 추격 (추가 학습 없이)

How to Apply

기존 RAG 파이프라인에서 벡터 검색 대신 KG를 붙이는 경우: 논문의 relation prune 프롬프트(E.3.1)와 entity prune 프롬프트(E.3.2)를 그대로 가져다 쓰고, depth=3, width=3으로 시작하면 됨
비용을 줄이고 싶다면 ToG-R 변형을 사용 — entity prune 단계를 LLM 대신 랜덤 샘플링으로 대체해 LLM 호출 횟수를 절반으로 줄이거나, BM25/SentenceBERT로 대체해 D+1번만 호출 가능
KG 품질 개선이 필요한 경우: ToG의 추론 경로를 사용자에게 노출시키면 오래된/잘못된 triple을 발견할 수 있고, LLM에게 수정 제안을 자동 생성하게 해 KG를 점진적으로 정제 가능

Code Example

snippet

# ToG 핵심 프롬프트 예시 (논문 Appendix E.3 기반)

# 1. Relation Prune 프롬프트
relation_prune_prompt = """
Please retrieve {k} relations (separated by semicolon) that contribute to the question
and rate their contribution on a scale from 0 to 1 (the sum of the scores of k relations is 1).

Q: {query}
Topic Entity: {topic_entity}
Relations: {candidate_relations}
A:
"""

# 2. Entity Prune 프롬프트
entity_prune_prompt = """
Please score the entities' contribution to the question on a scale from 0 to 1
(the sum of the scores of all entities is 1).

Q: {query}
Relation: {current_relation}
Entities: {candidate_entities}
Score:
"""

# 3. Reasoning (충분한 정보가 모였는지 판단)
reasoning_prompt = """
Given a question and the associated retrieved knowledge graph triples (entity, relation, entity),
you are asked to answer whether it's sufficient for you to answer the question
with these triples and your knowledge (Yes or No).

Q: {query}
Knowledge triples: {explored_paths}
A:
"""

# 4. Generate (최종 답변 생성)
generate_prompt = """
Given a question and the associated retrieved knowledge graph triples (entity, relation, entity),
you are asked to answer the question with these triples and your own knowledge.

Q: {query}
Knowledge triples: {explored_paths}
A:
"""

# ToG 알고리즘 의사코드
def think_on_graph(question, kg, llm, max_depth=3, beam_width=3):
    # 1. 초기화: 핵심 엔티티 추출
    topic_entities = llm.extract_entities(question)[:beam_width]
    paths = [[entity] for entity in topic_entities]
    
    for depth in range(max_depth):
        new_paths = []
        for path in paths:
            tail_entity = path[-1]
            
            # 2. Relation Search & Prune
            candidate_relations = kg.get_relations(tail_entity)
            top_relations = llm.prune_relations(
                question, tail_entity, candidate_relations, k=beam_width
            )
            
            # 3. Entity Search & Prune
            for rel in top_relations:
                candidate_entities = kg.get_entities(tail_entity, rel)
                top_entities = llm.prune_entities(
                    question, rel, candidate_entities
                )
                for entity in top_entities:
                    new_paths.append(path + [rel, entity])
        
        paths = new_paths[:beam_width]  # top-N 유지
        
        # 4. Reasoning: 충분한 정보가 모였는지 판단
        if llm.is_sufficient(question, paths):
            return llm.generate_answer(question, paths)
    
    # max_depth 도달시 LLM 내재 지식으로 답변
    return llm.generate_answer(question, paths)

Terminology

Knowledge Graph (KG)엔티티(사람, 장소, 개념)와 그 관계를 (주어, 관계, 목적어) 형태의 triple로 저장한 구조화된 지식 DB. 예: (호주, 수도, 캔버라). 위키피디아보다 기계가 이해하기 쉬운 형태.

Beam Search최선의 경로를 찾을 때 상위 N개 후보를 동시에 유지하며 탐색하는 방법. 길찾기 앱이 최적 경로 여러 개를 동시에 계산하는 것과 비슷.

HallucinationLLM이 사실이 아닌 내용을 그럴듯하게 만들어내는 현상. 자신 있게 틀린 말을 하는 것.

Multi-hop Reasoning답을 찾으려면 여러 단계를 거쳐야 하는 추론. 예: '캔버라가 있는 나라의 다수당' → 캔버라→호주→총리→정당 순서로 3번 이상 이동해야 함.

KBQAKnowledge Base Question Answering의 약자. 구조화된 지식 베이스(DB)를 기반으로 자연어 질문에 답하는 태스크.

Prune (가지치기)탐색 중 관련성이 낮은 후보를 제거해 탐색 공간을 줄이는 과정. 나무를 키울 때 불필요한 가지를 쳐내는 것과 같은 원리.

Plug-and-play별도 학습(fine-tuning) 없이 다양한 모델이나 데이터에 바로 연결해 쓸 수 있는 모듈식 설계.

TripleKG에서 지식의 기본 단위. (주어 엔티티, 관계, 목적어 엔티티) 형태. 예: (Anthony Albanese, political party, Labor Party).

Related Resources

Think-on-Graph GitHub (IDEA-FinAI/ToG)

Original Abstract (Expand)

Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.