Think-on-Graph: Knowledge Graph 위에서 LLM이 단계적으로 추론하는 방법
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph
TL;DR Highlight
LLM이 Knowledge Graph를 직접 탐색하며 beam search로 추론 경로를 찾는 ToG 프레임워크 — 추가 학습 없이 9개 중 6개 데이터셋에서 SOTA 달성.
Who Should Read
LLM 답변의 hallucination(환각) 문제를 줄이고 싶은 AI 백엔드 개발자. 특히 Knowledge Graph나 구조화된 외부 지식을 LLM 추론에 연결하려는 개발자.
Core Mechanics
- LLM을 단순 번역기로 쓰는 기존 'LLM ⊕ KG' 방식 대신, LLM이 KG 위에서 직접 beam search를 돌며 추론 경로를 탐색하는 'LLM ⊗ KG' 패러다임을 제안
- 매 단계마다 관계(relation) 탐색 → 가지치기(prune) → 엔티티 탐색 → 가지치기 순서로 반복하며, LLM이 '충분한 정보가 모였나?'를 스스로 판단해 조기 종료
- KG에 없는 정보는 LLM 내재 지식으로 보완 가능 — KG가 불완전해도 답을 찾을 수 있음 (예: 'majority party' 관계가 없어도 'prime minister → Anthony Albanese → Labor Party'로 우회)
- 추론 경로가 명시적으로 남아서 오답 원인 추적(knowledge traceability)과 KG 자동 수정(knowledge correctability)이 가능
- Llama-2-70B + ToG 조합이 GPT-4 단독 CoT보다 성능이 높은 경우가 있어, 비싼 대형 모델 없이도 운영 가능
- plug-and-play 구조라 ChatGPT, GPT-4, Llama-2 등 다양한 LLM과 Freebase, Wikidata 등 다양한 KG에 학습 없이 적용 가능
Evidence
- 9개 데이터셋 중 6개에서 SOTA 달성 (WebQSP 82.6%, GrailQA 81.4%, Zero-Shot RE 88.3%, Creak 95.6% 등) — 이전 SOTA 대부분은 fine-tuning 모델
- CoT 대비 GrailQA +51.8%, Zero-Shot RE +42.9% 성능 향상 (ChatGPT 기준)
- Llama-2-70B + ToG가 CWQ에서 53.6% → GPT-4 단독 CoT 46.0%를 초과, 더 저렴한 모델로 더 비싼 모델을 이김
- GPT-4 + ToG의 CWQ 성능 69.5% vs fine-tuning SOTA 70.4%로 0.9%p 차이까지 추격 (추가 학습 없이)
How to Apply
- 기존 RAG 파이프라인에서 벡터 검색 대신 KG를 붙이는 경우: 논문의 relation prune 프롬프트(E.3.1)와 entity prune 프롬프트(E.3.2)를 그대로 가져다 쓰고, depth=3, width=3으로 시작하면 됨
- 비용을 줄이고 싶다면 ToG-R 변형을 사용 — entity prune 단계를 LLM 대신 랜덤 샘플링으로 대체해 LLM 호출 횟수를 절반으로 줄이거나, BM25/SentenceBERT로 대체해 D+1번만 호출 가능
- KG 품질 개선이 필요한 경우: ToG의 추론 경로를 사용자에게 노출시키면 오래된/잘못된 triple을 발견할 수 있고, LLM에게 수정 제안을 자동 생성하게 해 KG를 점진적으로 정제 가능
Code Example
# ToG 핵심 프롬프트 예시 (논문 Appendix E.3 기반)
# 1. Relation Prune 프롬프트
relation_prune_prompt = """
Please retrieve {k} relations (separated by semicolon) that contribute to the question
and rate their contribution on a scale from 0 to 1 (the sum of the scores of k relations is 1).
Q: {query}
Topic Entity: {topic_entity}
Relations: {candidate_relations}
A:
"""
# 2. Entity Prune 프롬프트
entity_prune_prompt = """
Please score the entities' contribution to the question on a scale from 0 to 1
(the sum of the scores of all entities is 1).
Q: {query}
Relation: {current_relation}
Entities: {candidate_entities}
Score:
"""
# 3. Reasoning (충분한 정보가 모였는지 판단)
reasoning_prompt = """
Given a question and the associated retrieved knowledge graph triples (entity, relation, entity),
you are asked to answer whether it's sufficient for you to answer the question
with these triples and your knowledge (Yes or No).
Q: {query}
Knowledge triples: {explored_paths}
A:
"""
# 4. Generate (최종 답변 생성)
generate_prompt = """
Given a question and the associated retrieved knowledge graph triples (entity, relation, entity),
you are asked to answer the question with these triples and your own knowledge.
Q: {query}
Knowledge triples: {explored_paths}
A:
"""
# ToG 알고리즘 의사코드
def think_on_graph(question, kg, llm, max_depth=3, beam_width=3):
# 1. 초기화: 핵심 엔티티 추출
topic_entities = llm.extract_entities(question)[:beam_width]
paths = [[entity] for entity in topic_entities]
for depth in range(max_depth):
new_paths = []
for path in paths:
tail_entity = path[-1]
# 2. Relation Search & Prune
candidate_relations = kg.get_relations(tail_entity)
top_relations = llm.prune_relations(
question, tail_entity, candidate_relations, k=beam_width
)
# 3. Entity Search & Prune
for rel in top_relations:
candidate_entities = kg.get_entities(tail_entity, rel)
top_entities = llm.prune_entities(
question, rel, candidate_entities
)
for entity in top_entities:
new_paths.append(path + [rel, entity])
paths = new_paths[:beam_width] # top-N 유지
# 4. Reasoning: 충분한 정보가 모였는지 판단
if llm.is_sufficient(question, paths):
return llm.generate_answer(question, paths)
# max_depth 도달시 LLM 내재 지식으로 답변
return llm.generate_answer(question, paths)Terminology
Related Resources
Original Abstract (Expand)
Although large language models (LLMs) have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.