Claude.ai unavailable and elevated errors on the API

TL;DR Highlight

Anthropic’s entire service suite—Claude.ai, the API, Claude Code—became inaccessible for 1 hour and 18 minutes (17:34–18:52 UTC), sparking outrage among enterprise users over reliability concerns.

Who Should Read

Developers integrating the Claude API or Claude Code into production services, and team leaders grappling with LLM service availability and multi-model strategies.

Core Mechanics

The outage began at 17:34 UTC on April 28, 2026, and was resolved at 18:52 UTC, lasting a total of 1 hour and 18 minutes. Affected services included claude.ai, Claude Console (platform.claude.com), Claude API (api.anthropic.com), Claude Code, Claude Cowork, and Claude for Government—essentially the entire service portfolio.
The root cause was identified as an issue related to authentication. A surge in authentication errors occurred in API requests and Claude Code login paths, and claude.ai itself became inaccessible.
Anthropic announced the investigation at 17:41 UTC, identified the problem at 17:51 UTC, reported work in progress at 18:33 UTC, transitioned to a monitoring phase at 18:59 UTC, and declared final resolution at 19:15 UTC, updating the status page throughout.
Data shared from status.claude.com indicated that Claude’s uptime had fallen to the ‘one nine’ level—just over 90%—in the last 90 days. This level is widely considered unacceptable for production environments.
A user from an organization spending over $200,000 monthly on the enterprise tier reported frequent outages in recent months and poor support, leading to anger from leadership. They stated that a ‘one nine’ level of reliability is unacceptable given the cost.

Evidence

"A user spending over $200,000 monthly on Anthropic’s enterprise tier lamented frequent outages and poor support in recent months, indicating escalating frustration at the executive level and potentially leading to contract re-evaluation."

How to Apply

If you rely on the Claude API as a single point of failure in production, consider adding automatic fallback logic to alternative models like OpenAI (Codex) or Google (Gemini). This can ensure continued operation during outages like the one experienced.
Organizations spending tens of thousands of dollars monthly on the Claude API should regularly monitor Anthropic’s status.claude.com and subscribe to email/SMS alerts. Integrating with PagerDuty or Slack webhooks can reduce response times.
Teams heavily using Claude Code in their workflow should set up alternative coding agents like OpenAI Codex CLI in parallel. This allows work to continue even when Claude Code is unavailable due to authentication issues.
For teams of around 10 people where AI coding tool costs are a concern or stability is paramount, consider renting GPUs to self-host open models like Qwen or DeepSeek. While initial setup is required, it offers direct control over downtime risk and potential long-term cost savings.

Terminology

uptimeThe percentage of time a service is functioning normally. 'One nine' represents approximately 90%, 'five nine' represents 99.999%, and production services typically require at least 'three nine' (99.9%) uptime.

authentication errorAn error that occurs during the process of verifying a user's identity (e.g., login, API key validation). If the authentication server fails, no one can access the service.

non-deterministicA characteristic where the same input can produce different outputs each time. LLMs are inherently non-deterministic, leading to varying results even with the same prompt, which reduces predictability in production.

fallbackAn alternative system or logic that automatically activates when the primary system fails. For example, configuring the system to call the OpenAI API automatically if the Claude API is down.

session limitThe maximum number of tokens or requests allowed within a single session (usage period) of a service like Claude Code. Reaching this limit can terminate the session or restrict further use.

prefix cachingA feature in LLM APIs that caches the repeating beginning portion (prefix) of prompts on the server, avoiding retransmission of the same context. This improves both cost and response speed.