Assessing Claude Mythos Preview's cybersecurity capabilities
TL;DR Highlight
Anthropic's new model, Claude Mythos Preview, has reached a level where it can autonomously discover and even create exploits for zero-day vulnerabilities in major OS and browsers, demonstrating a dramatic performance improvement over previous models and signaling a time for urgent response across the security industry.
Who Should Read
Security researchers, developers working on vulnerability analysis and penetration testing, and security architects who need to understand the impact of AI models on cybersecurity and develop defense strategies.
Core Mechanics
- Claude Mythos Preview demonstrated the ability to find zero-day (previously undiscovered) vulnerabilities across major operating systems (Linux, FreeBSD, OpenBSD, etc.) and major web browsers, and autonomously write exploits (actual attack code).
- Many of the vulnerabilities discovered are decades old. In security-renowned OpenBSD, it found a bug 27 years old, and also discovered numerous vulnerabilities 10-20 years old.
- The complexity of the exploits is beyond simple stack overflows. In browsers, it created a complex JIT heap spray (a memory vulnerability attack technique) exploit that chained 4 vulnerabilities to escape both the renderer and OS sandbox.
- For FreeBSD's NFS server, it autonomously completed an RCE (Remote Code Execution) exploit that obtains root privileges remotely without authentication, distributing 20 gadgets (ROP chain) across multiple packets.
- The performance difference compared to the previous model, Opus 4.6, is dramatic. While Opus 4.6 succeeded in exploiting a Firefox 147 JS engine vulnerability only 2 times out of hundreds of attempts, Mythos Preview succeeded 181 times and gained register control an additional 29 times under the same conditions.
- Even an Anthropic internal engineer without formal security training can receive a completed exploit the next morning simply by requesting Mythos Preview to find an RCE vulnerability.
- More than 99% of the discovered vulnerabilities are still unpatched, making it impossible to disclose specific details. Anthropic stated that even the publicly available 1% demonstrates a groundbreaking leap.
- In response, Anthropic launched Project Glasswing, a collaborative project that leverages Mythos Preview to defensively protect the world's critical software and prepare the industry to stay ahead of attackers.
Evidence
- Concerns were raised about hundreds of millions of embedded devices that are difficult to upgrade running vulnerable binaries indefinitely. One commenter mentioned that they had proposed the concept of an 'antibotty network' in a 2025 paper, where frontier models remotely inject 'beneficial attacks' into old binaries to immunize them, expressing surprise at how quickly the technology has advanced.
- There was also skepticism about whether the demonstration of Mythos Preview, which focused on decades-old C/C++ codebases, was an exaggeration. Browsers are somewhat protected by sandboxing, OSes inherently have a higher vulnerability density, and KASLR (Address Space Layout Randomization) has been practically useless for LPE (Local Privilege Escalation) defense for years.
- There were comments analyzing why LLMs are particularly strong in the exploit domain. Security attacks have a clear 'success/failure' reward function, making them easy to optimize, while defining a reward function for 'good software architecture' is difficult, resulting in slower progress.
- Concerns were also raised that AI-driven vulnerability scanning could harm the F/OSS (Free/Open Source Software) ecosystem. Large companies can afford these analysis costs, but small open-source projects cannot.
- There was a cynical view regarding AI safety. One comment pointed out that 'the release of improved models being exploited by malicious actors to cause noticeable harm to society may ironically accelerate the AI safety discussion.'
How to Apply
- If you are maintaining an open-source project, monitor Anthropic's Project Glasswing collaboration channel and consider applying to participate in AI-based vulnerability scanning programs targeting your codebase. If Mythos-level models are used for defensive purposes, they can quickly find and patch bugs that would take humans decades to discover.
- If you are operating legacy C/C++ codebases (embedded firmware, old server daemons, etc.), immediately review network isolation and access control strengthening if patching is impossible. Mythos Preview-level models can find and chain decades-old bugs, so the assumption that 'old code is safe' is no longer valid.
- If you have a security team, experiment with building a pipeline to assist red team operations by introducing an AI agent-based automated exploit scanner in your internal CTF (Capture The Flag) environment or staging server. With LLMs like Mythos Preview having improved ability to explore program states, you can save human resources by leveraging agents for repetitive and broad vulnerability exploration.
- Improve your infrastructure towards stronger sandbox-based isolation (containers, Firecracker VMs, WebAssembly, etc.). As pointed out in the comments, AI is particularly strong at vulnerability chaining, so it is even more important to design 'defense in depth' with multiple layers of defense to minimize damage from a single vulnerability.
Terminology
Related Papers
Can LLMs model real-world systems in TLA+?
LLM이 TLA+ 명세를 작성할 때 문법은 잘 통과하지만 실제 시스템과의 동작 일치도(conformance)는 46% 수준에 그친다는 걸 체계적으로 검증한 벤치마크 연구로, AI 기반 형식 검증의 현실적 한계를 보여준다.
Natural Language Autoencoders: Turning Claude's Thoughts into Text
Anthropic이 LLM 내부의 숫자 벡터(활성화값)를 직접 읽을 수 있는 자연어로 변환하는 NLA 기법을 공개했다. AI가 실제로 무슨 생각을 하는지 해석하는 interpretability 연구의 새로운 진전이다.
ProgramBench: Can language models rebuild programs from scratch?
LLM이 FFmpeg, SQLite, PHP 인터프리터 같은 실제 소프트웨어를 문서만 보고 처음부터 재구현할 수 있는지 측정하는 새 벤치마크로, 최고 모델도 전체 태스크의 3%만 95% 이상 통과하는 수준에 그쳤다.
MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents
티켓 3장으로 쪼개면 Claude/GPT도 보안 취약점 코드를 53~86% 확률로 그냥 짜준다.
Refusal in Language Models Is Mediated by a Single Direction
Open-source chat models encode safety as a single vector direction, and removing it disables safety fine-tuning.
Show HN: A new benchmark for testing LLMs for deterministic outputs
Structured Output Benchmark assesses LLM JSON handling across seven metrics, revealing performance beyond schema compliance.