Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
TL;DR Highlight
A simple anonymization technique to detect when an LLM analyzes based on its memorized knowledge instead of the data.
Who Should Read
ML engineers or data scientists utilizing LLMs in data analysis pipelines. Specifically, AI agent developers concerned about the potential for LLMs to distort results by relying on pre-trained knowledge rather than the provided data.
Core Mechanics
- LLMs subtly fill answers with memorized knowledge (parametric knowledge) the moment they see 'famous entities' such as gene names or stock codes, rather than relying on the provided data.
- Epistemic blinding involves substituting entity names with anonymous codes (e.g., Gene_001, Company_042) and then A/B comparing the results with the original names exposed.
- The key point is not that 'the blinded result is better,' but that 'without blinding, there is no way to know if the LLM followed the data or its memorization' – restoring auditability.
- Through ShinkaEvolve, an evolutionary optimization using Claude Sonnet as a mutation operator, a scoring function was evolved over 13 generations using only numerical features without gene names.
- Contamination is more severe when feature signals are weak. Consistency is 90% in AML with strong signals and 75% in GBM with weak signals, showing a predictable pattern of contamination.
- Implemented with Claude Code skill, allowing users to automatically execute blinding → A/B comparison → de-anonymization with a single command without separate scripts.
Evidence
- "In predicting the top 20 drug targets for 4 oncology cancer types, there was an 84% average agreement between blinded and unblinded results → 16% of predictions changed, but the recall of validated targets was the same in both conditions (average 2.75/cancer type).\nIn GBM (glioblastoma), DPP8 was #3 in the blinded condition but dropped to #9 when the name was exposed, while PTEN jumped 12 places from #15 blinded to #3 unblinded.\nIn S&P 500 value investing screening, 7 out of 20 (35%) of the top companies changed when the ticker was exposed, and ELV and CI showed a systematic bias of moving up in the rankings 4 out of 5 times when unblinded.\nEvolutionary optimization results: The final function among 53 candidate functions in generation 13 achieved a strict hit rate of including FDA-approved targets in the top 20 for 10 out of 18 diseases, with fitness improving from 68% to 82% compared to the initial naive function (Generation 0)."
How to Apply
- "When asking an LLM to rank/score based on data, specify the entity columns (gene names, company names, candidate drug names, etc.) in a YAML config and anonymize with blind.py, then send each A/B prompt to a fresh session. If the results differ significantly, the LLM answered with its prior knowledge, not your data.\nIf feature values can identify entities (e.g., $3 trillion market cap + smartphone sector = Apple), normalize or bin that feature by sector to reduce structural identifiability before blinding.\nIf using an agentic workflow in the Claude Code environment, adding epistemic-blinding as a Claude Code skill will automatically run blinding → comparison → de-anonymize in one turn when requesting ranking/prioritization."
Code Example
# Installation and basic usage
# git clone https://github.com/mcuccarese/epistemic-blinding
# 1. Write YAML configuration file (config.yaml)
# datasets:
# - path: genes.csv
# entity_columns: [gene_symbol]
# task: "Rank the top 20 drug targets based solely on the provided features."
# seed: 42
# 2. Generate blinded prompt
python blind.py --config config.yaml
# → blinded_prompt.txt, unblinded_prompt.txt, mapping.json are created
# 3. Send each to LLM and compare results
python compare.py \
--blinded blinded_output.txt \
--unblinded unblinded_output.txt \
--mapping mapping.json
# → Set overlap, Jaccard index, Mean rank delta, Kendall τ are output
# 4. Restore anonymous code → actual entity
python deblind.py \
--response blinded_output.txt \
--mapping mapping.json
# Use Claude Code skill (one command)
# "Rank these companies by value. Use epistemic blinding."
# → Automatically runs blind → A/B execution → deblindTerminology
Related Papers
Can LLMs model real-world systems in TLA+?
LLM이 TLA+ 명세를 작성할 때 문법은 잘 통과하지만 실제 시스템과의 동작 일치도(conformance)는 46% 수준에 그친다는 걸 체계적으로 검증한 벤치마크 연구로, AI 기반 형식 검증의 현실적 한계를 보여준다.
Natural Language Autoencoders: Turning Claude's Thoughts into Text
Anthropic이 LLM 내부의 숫자 벡터(활성화값)를 직접 읽을 수 있는 자연어로 변환하는 NLA 기법을 공개했다. AI가 실제로 무슨 생각을 하는지 해석하는 interpretability 연구의 새로운 진전이다.
ProgramBench: Can language models rebuild programs from scratch?
LLM이 FFmpeg, SQLite, PHP 인터프리터 같은 실제 소프트웨어를 문서만 보고 처음부터 재구현할 수 있는지 측정하는 새 벤치마크로, 최고 모델도 전체 태스크의 3%만 95% 이상 통과하는 수준에 그쳤다.
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
티켓 3장으로 쪼개면 Claude/GPT도 보안 취약점 코드를 53~86% 확률로 그냥 짜준다.
Refusal in Language Models Is Mediated by a Single Direction
Open-source chat models encode safety as a single vector direction, and removing it disables safety fine-tuning.
Show HN: A new benchmark for testing LLMs for deterministic outputs
Structured Output Benchmark assesses LLM JSON handling across seven metrics, revealing performance beyond schema compliance.
Related Resources
Original Abstract (Expand)
This paper presents epistemic blinding in the context of an agentic system that uses large language models to reason across multiple biological datasets for drug target prioritization. During development, it became apparent that LLM outputs silently blend data-driven inference with memorized priors about named entities - and the blend is invisible: there is no way to determine, from a single output, how much came from the data on the page and how much came from the model's training memory. Epistemic blinding is a simple inference-time protocol that replaces entity identifiers with anonymous codes before prompting, then compares outputs against an unblinded control. The protocol does not make LLM reasoning deterministic, but it restores one critical axis of auditability: measuring how much of an output came from the supplied data versus the model's parametric knowledge. The complete target identification system is described - including LLM-guided evolutionary optimization of scoring functions and blinded agentic reasoning for target rationalization - with demonstration that both stages operate without access to entity identity. In oncology drug target prioritization across four cancer types, blinding changes 16% of top-20 predictions while preserving identical recovery of validated targets. The contamination problem is shown to generalize beyond biology: in S&P 500 equity screening, brand-recognition bias reshapes 30-40% of top-20 rankings across five random seeds. To lower the barrier to adoption, the protocol is released as an open-source tool and as a Claude Code skill that enables one-command epistemic blinding within agentic workflows. The claim is not that blinded analysis produces better results, but that without blinding, there is no way to know to what degree the agent is adhering to the analytical process the researcher designed.