Show HN: MacMind – A transformer neural network in HyperCard on a 1989 Macintosh
TL;DR Highlight
This is an educational project implementing a single-layer Transformer with 1,216 parameters in the scripting language HyperTalk (1987) and training it on a real Macintosh SE/30. It demonstrates that the core mathematics of modern LLMs works the same on hardware from 30 years ago.
Who Should Read
Developers or beginners in AI who want to deeply understand the internal workings of Transformers and backpropagation visually and at the code level. Particularly suitable for those who want to dissect all the formulas without a black box.
Core Mechanics
- This project implements a complete Transformer neural network in HyperTalk (a scripting language for HyperCard created in 1987). It operates with pure scripts without compiled code or external libraries, despite not being designed for matrix operations.
- The model size is a single-layer, single-head Transformer with 1,216 parameters. While extremely small compared to GPT-4's approximately 1 trillion parameters, the mathematics of the training loop – forward pass → loss calculation → backward pass → weight update – is completely identical.
- The learning task is bit-reversal permutation, the first step of the Fast Fourier Transform (FFT), which rearranges the order by reversing the binary representation of each position index. For example, in a sequence of 8 elements, position 1 (001) moves to position 4 (100).
- The implementation includes token embedding (converting tokens to vectors), positional encoding (adding positional information), scaled dot-product self-attention (query-key-value based attention), cross-entropy loss, complete backpropagation, and stochastic gradient descent.
- You can directly see the actual formula code behind the buttons by Option-clicking in HyperCard's script editor. Learning rate changes, task replacements, and model size adjustments can all be done directly within the GUI, clearly demonstrating its purpose as a learning tool.
- The purpose of the creation is to prove that 'AI is not magic, but mathematics'. It conveys the message that the mathematics is the same whether backpropagation and attention run on a TPU cluster or a 1987 68000 processor.
- It was actually trained on a 1989 Macintosh SE/30, and a validation script written in Python (validate.py) is also provided.
Evidence
- There was a reaction that driving modern AI ideas on old hardware (like this project or running LLM inference on Windows 3.1) 'reminds us that progress is not just about bigger GPUs and more computing, but about smarter mathematics and algorithms'. It is considered closer to the spirit of early computing than the current trend of 'throwing hardware at the problem'.
- Someone who first studied backpropagation in 1988 and simultaneously fell in love with HyperCard programming reminisced, saying 'this project evokes the elegant tools of that era'.
- Information was shared that it can be run in the HyperCard Simulator (hcsimulator.com). It works well enough in the simulator even without XCMD (external compile command), and a directly imported link (https://hcsimulator.com/imports/MacMind---Trained-69E0132C) was provided.
- There was a comment asking where the source code of the actual HyperCard stack (.img file) is, as only the Python validation script is in the GitHub repository. This reflects the interest of developers who want to see the HyperTalk code directly.
- There was a philosophical comment that 'modern concepts are modern simply because no one thought of them at the time', and that this project feels like delivering germ theory to ancient Greece. It provides context that technological advancement is the advancement of means of implementation rather than the invention of fundamental concepts.
How to Apply
- If you want to study the mathematics of Transformer attention and backpropagation not in theory but in actual working code, open MacMind directly in the HyperCard Simulator (hcsimulator.com) and step through the formula implementation code by Option-clicking each button. It is implemented with pure formulas without external libraries, allowing for a clear understanding of the concepts.
- When explaining the Transformer learning process to junior developers or non-ML backend developers, you can use MacMind's learning task (bit-reversal permutation) and 1,216 parameter model as an example to intuitively explain the 'forward pass → loss → backward pass → weight update' loop.
- If you want to quickly experiment with the impact of hyperparameters such as learning rate and model size, you can modify the values in the HyperCard script editor and rerun. The experimental environment is completely transparent, making it easy to track the impact of each change on the results.
Terminology
Related Papers
Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library
PyTorch Lightning packages 2.6.2 and 2.6.3 delivered credential-stealing malware via a supply chain attack.
Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs
Fine-tuning even safety-aligned LLMs can bypass safeguards and reproduce copyrighted text verbatim, revealing prompt filtering alone isn't enough to prevent copyright infringement.
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU
Introducing MegaTrain, a system that leverages CPU memory as the primary storage and utilizes the GPU solely as a compute engine, enabling full-precision training of 120B parameter models with just a single H200 GPU.
Show HN: I built a tiny LLM to demystify how language models work
This educational project allows you to build a mini LLM with 8.7 million parameters, trained on a Guppy fish character, from scratch in just 5 minutes using a single Colab notebook, focusing on demystifying the black box nature of LLMs.
Nanocode: The best Claude Code that $200 can buy in pure JAX on TPUs
An open-source library that allows you to train a 1.3B parameter coding agent model from scratch on a $200 (approximately 270,000 KRW) TPU, following Anthropic's Constitutional AI approach. It can serve as a hands-on reference for developers who want to directly understand the entire AI training pipeline.