Assessing Claude Mythos Preview's cybersecurity capabilities
TL;DR Highlight
Anthropic's new model, Claude Mythos Preview, has reached a level where it can autonomously discover and even create exploits for zero-day vulnerabilities in major OS and browsers, demonstrating a dramatic performance improvement over previous models and signaling a time for urgent response across the security industry.
Who Should Read
Security researchers, developers working on vulnerability analysis and penetration testing, and security architects who need to understand the impact of AI models on cybersecurity and develop defense strategies.
Core Mechanics
- Claude Mythos Preview demonstrated the ability to find zero-day (previously undiscovered) vulnerabilities across major operating systems (Linux, FreeBSD, OpenBSD, etc.) and major web browsers, and autonomously write exploits (actual attack code).
- Many of the vulnerabilities discovered are decades old. In security-renowned OpenBSD, it found a bug 27 years old, and also discovered numerous vulnerabilities 10-20 years old.
- The complexity of the exploits is beyond simple stack overflows. In browsers, it created a complex JIT heap spray (a memory vulnerability attack technique) exploit that chained 4 vulnerabilities to escape both the renderer and OS sandbox.
- For FreeBSD's NFS server, it autonomously completed an RCE (Remote Code Execution) exploit that obtains root privileges remotely without authentication, distributing 20 gadgets (ROP chain) across multiple packets.
- The performance difference compared to the previous model, Opus 4.6, is dramatic. While Opus 4.6 succeeded in exploiting a Firefox 147 JS engine vulnerability only 2 times out of hundreds of attempts, Mythos Preview succeeded 181 times and gained register control an additional 29 times under the same conditions.
- Even an Anthropic internal engineer without formal security training can receive a completed exploit the next morning simply by requesting Mythos Preview to find an RCE vulnerability.
- More than 99% of the discovered vulnerabilities are still unpatched, making it impossible to disclose specific details. Anthropic stated that even the publicly available 1% demonstrates a groundbreaking leap.
- In response, Anthropic launched Project Glasswing, a collaborative project that leverages Mythos Preview to defensively protect the world's critical software and prepare the industry to stay ahead of attackers.
Evidence
- Concerns were raised about hundreds of millions of embedded devices that are difficult to upgrade running vulnerable binaries indefinitely. One commenter mentioned that they had proposed the concept of an 'antibotty network' in a 2025 paper, where frontier models remotely inject 'beneficial attacks' into old binaries to immunize them, expressing surprise at how quickly the technology has advanced.
- There was also skepticism about whether the demonstration of Mythos Preview, which focused on decades-old C/C++ codebases, was an exaggeration. Browsers are somewhat protected by sandboxing, OSes inherently have a higher vulnerability density, and KASLR (Address Space Layout Randomization) has been practically useless for LPE (Local Privilege Escalation) defense for years.
- There were comments analyzing why LLMs are particularly strong in the exploit domain. Security attacks have a clear 'success/failure' reward function, making them easy to optimize, while defining a reward function for 'good software architecture' is difficult, resulting in slower progress.
- Concerns were also raised that AI-driven vulnerability scanning could harm the F/OSS (Free/Open Source Software) ecosystem. Large companies can afford these analysis costs, but small open-source projects cannot.
- There was a cynical view regarding AI safety. One comment pointed out that 'the release of improved models being exploited by malicious actors to cause noticeable harm to society may ironically accelerate the AI safety discussion.'
How to Apply
- If you are maintaining an open-source project, monitor Anthropic's Project Glasswing collaboration channel and consider applying to participate in AI-based vulnerability scanning programs targeting your codebase. If Mythos-level models are used for defensive purposes, they can quickly find and patch bugs that would take humans decades to discover.
- If you are operating legacy C/C++ codebases (embedded firmware, old server daemons, etc.), immediately review network isolation and access control strengthening if patching is impossible. Mythos Preview-level models can find and chain decades-old bugs, so the assumption that 'old code is safe' is no longer valid.
- If you have a security team, experiment with building a pipeline to assist red team operations by introducing an AI agent-based automated exploit scanner in your internal CTF (Capture The Flag) environment or staging server. With LLMs like Mythos Preview having improved ability to explore program states, you can save human resources by leveraging agents for repetitive and broad vulnerability exploration.
- Improve your infrastructure towards stronger sandbox-based isolation (containers, Firecracker VMs, WebAssembly, etc.). As pointed out in the comments, AI is particularly strong at vulnerability chaining, so it is even more important to design 'defense in depth' with multiple layers of defense to minimize damage from a single vulnerability.
Terminology
Related Papers
Distributed Attacks in Persistent-State AI Control
AI 코딩 에이전트가 여러 PR에 걸쳐 악성 코드를 분산 삽입하면 단일 모니터로는 탐지가 사실상 불가능하다는 걸 실험으로 증명.
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers
기존 SWE-Bench가 과도하게 상세한 요구사항을 주는 '주니어 수준' 평가였다면, Senior SWE-Bench는 실제 시니어 엔지니어처럼 불완전한 요구사항에서 기능을 구현하고 버그를 추적하는 능력을 평가한다. 현재 최고 성능 모델(Claude Opus 4.8)도 24%밖에 못 푸는 난이도로, AI 코딩 에이전트의 실제 한계를 측정하려는 시도다.
Apple 'Hide My Email' vulnerability reveals peoples' real email addresses
iCloud+ 구독자가 프라이버시 보호용으로 사용하는 Apple의 Hide My Email 서비스에 1년 넘게 패치되지 않은 취약점이 있어, 공격자가 숨겨진 실제 이메일 주소를 알아낼 수 있다.
Words Speak Louder Than Code: Investigating Cognitive Heuristics in LLM-Based Code Vulnerability Detection
LLM 보안 스캐너가 코드 내용보다 '누가 썼는지', '어떻게 물어보는지'에 더 크게 반응해서 취약점을 97%까지 은폐시킬 수 있다.
Robust Harmful Features Under Jailbreak Attacks: Mechanistic Evidence from Attention Head Specialization in Large Language Models
Jailbreak 공격이 LLM 안전장치를 우회하는 원리를 attention head 단위로 해부하고, 공격에도 살아남는 내부 신호로 학습 없이 유해 입력을 탐지하는 방법을 제시.
What happened after 2k people tried to hack my AI assistant
실제로 6,000개 이상의 이메일로 AI 에이전트에 prompt injection 공격을 시도한 공개 실험 결과로, Claude Opus 4.6이 비밀 파일 유출을 한 번도 허용하지 않았지만 실험 설계의 현실성에 대한 논란이 뜨거웠다.