LLM Visualization

Sep 4, 2025•gmays•View Original

TL;DR Highlight

An interactive website that visualizes the entire process of how Transformer-based LLMs process tokens step by step — understand LLM internals intuitively without code.

Who Should Read

Developers who conceptually understand LLM architecture but can't quite grasp the actual computation flow, or ML engineers who need to explain Transformers to team members or learners.

Core Mechanics

bbycroft.net/llm provides an interactive 3D visualization of the entire GPT-family LLM pipeline from token embedding → Attention → FFN → output probability distribution.
You can trace step-by-step through each layer how the Attention mechanism calculates relationships between tokens and how Q/K/V matrix operations proceed.
The visualization uses a small example model for structural explanation, not actual model weights — focused on understanding the 'overall flow.'
Andrej Karpathy walked through this visualization in a YouTube video (youtu.be/7xTGNNLPyMI), increasing its educational value.
It forms part of an educational resource ecosystem alongside Georgia Tech's Transformer Explainer (poloclub.github.io/transformer-explainer) and Jay Alammar's Illustrated Transformer.
A noted limitation: 'You can visualize the entire process, but why it makes specific decisions (interpretability) is still a black box' — mentioned as an unsolved AI interpretability challenge.
Custom input support for changing text and seeing attention flow or embedding space changes in real-time doesn't exist yet — flagged as a future improvement request.

Evidence

Karpathy's YouTube walkthrough video was recommended in multiple comments as a complementary resource. The video fills in formula flows that are hard to grasp from visualization alone.
The paradox of 'being able to see all computations but not knowing why it produces this answer' resonated. Visualization ≠ interpretability.
Multiple requests for real weights and custom input support. Embedding space exploration similar to 3Blue1Brown's LLM videos was also requested.
A meta-comment noted HN's 'high-quality technical articles with few comments' pattern — articles that take long to read get comments from people who only read existing comments, and by the time you finish reading, the post has fallen off the front page.
Comments ranged from a coding club leader wanting to show it to 5-year-olds to professors planning to use it as lecture supplementary material — high educational value for non-specialists and beginners.

How to Apply

If you need to explain LLM architecture to a team, use this visualization as a live demo instead of slides to intuitively convey how attention layers stack. Pairing with Karpathy's video doubles the impact.
When reading the Transformer paper ('Attention is All You Need') and Q/K/V operations or positional encoding feel abstract, explore those specific layers in this visualization to connect them with the formulas.
When model behavior doesn't match expectations during LLM fine-tuning or prompt engineering, reviewing the full token processing flow in this visualization can recalibrate your mental model of 'what happens at each stage.'

Terminology

AttentionThe core operation of transformers, computing 'how related' input tokens are to each other so the model focuses more on important tokens. Similar to figuring out what a pronoun refers to in a sentence.

Q/K/VThree matrices used in attention computation. Query (what to search for), Key (each token's identifier), Value (actual information to pass). Like searching a library catalog (Q) to find book listings (K) and retrieve content (V).

EmbeddingConverting a word or token into a numeric vector of hundreds to thousands of dimensions. Words with similar meaning sit close in this vector space.

FFNFeed-Forward Network. A layer that refines each token's representation via nonlinear transformation after attention. While attention looks at 'inter-token relationships,' FFN transforms 'the token itself.'

Positional EncodingSince Transformers don't inherently know order, positional information is numerically appended to each token. Like adding 'this word is at position 3 in the sentence' to the vector.

InterpretabilityA research field analyzing model internals so humans can understand why specific outputs were produced. Currently, we can visualize computation flows but explaining 'why this decision' remains very difficult.