Introduction
On October 20, 2025, Amazon Web Services (AWS) experienced a significant global outage that disrupted thousands of websites and applications across multiple sectors. The incident, concentrated in the US-EAST-1 (Northern Virginia) region, caused widespread disruptions from early morning hours and was fully mitigated by mid-afternoon.
Timeline of Events
- 12:11 a.m. ET: AWS reported elevated error rates affecting 14 services, including EC2, S3, DynamoDB, IAM, and Route 53 in the US-EAST-1 region.
- 2:00 a.m. ET: AWS identified DNS resolution issues affecting the DynamoDB API endpoint, disrupting dependent services like Lambda, CloudTrail, and API Gateway.
- 3:35 a.m. PT (6:35 a.m. ET): AWS applied mitigations; most services began recovering.
- Late morning: Residual slowdowns persisted for Bedrock and RDS users; full recovery was observed around 12:30 p.m. ET.
Root Cause Analysis
AWS confirmed the outage stemmed from a DNS resolution failure linked to DynamoDB’s endpoint in US-EAST-1. This caused cascading API request failures across globally reliant services (e.g., IAM and DynamoDB Global Tables). AWS engineers indicated it was not security-related, but rather an internal infrastructure misconfiguration affecting internal DNS propagation.
Affected Services and Regions
Beyond EC2 and S3, the disruption rippled through:
- AWS Bedrock (AI model endpoints temporarily unreachable)
- RDS and CloudFront (intermittent access and replication failures)
- Route 53 (DNS propagation delays)
- IAM (authorization bottlenecks) While primarily centered in US-EAST-1, user reports spiked globally—particularly in Europe, Japan, and Australia—with over 6.5 million outage reports recorded.
Industry Impact
- Banking & Fintech: Coinbase and Robinhood trading platforms temporarily halted transactions.
- E-commerce: Amazon, Shopify, and Etsy order processing systems disrupted; checkout API latencies exceeded 3 seconds on average.
- SaaS Platforms: Canva, Notion, and Zoom experienced degraded performance.
- AI Workloads: Bedrock-based applications (including Perplexity AI) saw slow inferencing and retraining jobs paused.
- Government Workloads: GovCloud dependencies reported short-lived API throttling but recovered by 10 a.m. ET.
Monitoring and Provider Responses
- AWS Statement: Confirmed recovery was in progress, citing DNS failure in DynamoDB as root cause and full restoration by afternoon.
- Datadog & Dynatrace: Both noted incomplete telemetry during the event due to AWS dependence.
- Cloudflare: Reported normal edge operations, confirming the fault was internal to AWS DNS infrastructure.
Security and Cascading Effects
No evidence of cyberattack or data corruption was found. However, temporary IAM authorization mismatches and API Gateway rate throttling disrupted authentication services for some customers. Bedrock latency and Lambda backlogs triggered delayed automation pipelines across AI-driven workflows.
Historical Comparison
- December 2021: 12/7/2021 | ~7 hours | Network congestion | US-EAST-1 | Major e-commerce and media disruption
- June 2023: 6/13/2023 | ~2 hours | Authentication service overload | US-EAST-2 | Moderate SaaS and IoT impact
- October 2025: 10/20/2025 | ~10 hours (intermittent) | DNS resolution failure | US-EAST-1 | Global — Banking, AI, SaaS, and retail
Expert Commentary & Implications
Industry analysts argue this outage underscores single-region dependency risks and the fragile coupling between DNS and modern distributed applications. Gartner analysts noted enterprises using multicloud redundancy or failover DNS like Cloudflare’s D1 or Azure Traffic Manager maintained higher uptime. CIOs and CTOs are advised to:
- Architect failover for critical APIs outside a single AWS region.
- Deploy independent DNS and monitoring layers.
- Stress-test IAM and event-driven Lambda automations against service lag conditions.
Key Takeaways for Executives
- The October 2025 outage was a DNS-level failure, not a cyberattack.
- US-EAST-1 single-region dependency continues to be an enterprise continuity concern.
- Companies with multicloud resilience experienced minimal downtime.
- AWS recovery visibility is improving, but communication transparency remains a critical customer expectation.
Overall Impact: Temporary but severe — approximately 20% of global internet traffic was degraded or interrupted for up to 8 hours. AWS has since stated it is implementing “enhanced DNS partitioning and inter-region failover automation” to prevent recurrence.
About the Author

Rejith Krishnan
Rejith Krishnan is the Founder and CEO of lowtouch.ai, a platform dedicated to empowering enterprises with private, no-code AI agents. With expertise in Site Reliability Engineering (SRE), Kubernetes, and AI systems architecture, he is passionate about simplifying the adoption of AI-driven automation to transform business operations.
Rejith specializes in deploying Large Language Models (LLMs) and building intelligent agents that automate workflows, enhance customer experiences, and optimize IT processes, all while ensuring data privacy and security. His mission is to help businesses unlock the full potential of enterprise AI with seamless, scalable, and secure solutions that fit their unique needs.




