90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.
TL;DR Highlight
This is a workflow sharing post about how pre-organizing a codebase in Wiki format can reduce token usage per Claude session by more than 90% instead of directly exploring the codebase every time.
Who Should Read
Developers who are using Claude or other LLMs for codebase exploration and development tasks and are facing token cost or context limitations.
Core Mechanics
- Allowing the AI to directly explore files (cold exploration) in each session consumes a lot of unnecessary tokens, which can be solved by pre-organizing the codebase in Wiki format.
- This approach is inspired by the workflow used by Andrej Karpathy, and the key is to pre-compile the structure and core content of the codebase.
- It is reported that this method reduces token usage per session by more than 90%, significantly reducing the cost of repetitive codebase exploration.
- Due to blocked access to the original source, it was not possible to confirm the specific implementation methods, tools, and scripts.
Evidence
- "(No comment information)"
How to Apply
- If you are working with Claude on the same codebase repeatedly, you can try creating a Markdown Wiki file in advance that organizes the project structure, key modules, and function roles, and injecting only that file at the beginning of each session.
- When starting a new project, have Claude explore the entire codebase only once, and save the results in a file like CODEBASE_WIKI.md. Subsequent sessions can then refer to only that file to save tokens.
- If you need specific implementation methods from the original post, visit the original Reddit URL (https://www.reddit.com/r/ClaudeAI/comments/1sfdztg/) directly or refer to Karpathy's publicly available workflow-related materials.
Terminology
Related Papers
Claude-real-video - any LLM can watch a video
YouTube URL이나 로컬 영상 파일에서 장면 변화 기반으로 핵심 프레임만 추출하고 음성 전사까지 해서 LLM에게 넘겨주는 오픈소스 도구. Claude는 영상 파일을 못 받고, ChatGPT는 자막만 읽고, Gemini는 고정 1fps 샘플링이라는 한계를 모두 우회한다.
ReContext: Recursive Evidence Replay as LLM Harness for Long-Context Reasoning
128K 토큰 컨텍스트에서 모델 내부 attention 신호로 핵심 증거만 추출해 재주입하면 추론 정확도가 24.6% 오른다.
Single and Multi Truth Data Fusion using Large Language Models
여러 소스의 충돌하는 데이터를 GPT-4o-mini 프롬프트로 병합하면 기존 비지도 방법보다 일관되게 F1 점수가 높다.
Multilingual Reasoning Cascades Need More Context
번역 cascade 파이프라인에서 원본 질문을 마지막까지 유지하면 추가 학습 없이 다국어 성능이 크게 오른다.
Less Back-and-Forth: A Comparative Study of Structured Prompting
체크리스트 형식으로 프롬프트를 구조화하면 LLM 답변 품질도 높아지고 토큰도 적게 쓴다.
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement
재학습 없이 각 나라의 도덕적 가치관에 맞게 LLM 출력을 조정하는 추론 시점 기법 DISCA 제안
Using Claude Code: The unreasonable effectiveness of HTML