Web application, Public Content, and 31 other services are down
Resolved
Oct 21 at 01:32am PDT
AWS Outage Impact Report
Overview
Between 6:49 AM GMT on October 19 and 9:24 AM GMT on October 20, AWS experienced increased error rates and latencies across multiple services in the US-EAST-1 Region.
Services relying on US-EAST-1 endpoints—including IAM (Identity and Access Management) and DynamoDB Global Tables—were also affected.
Initial Impact on Scenario
For Scenario, this resulted in a sharp rise in HTTP 500 (Internal Server Error) and 404 (Not Found) responses from the API, increasing from near zero to tens of thousands starting at 6:55 AM GMT.
This was caused by the platform’s inability to access its database and process incoming requests.
Root Cause and Initial Recovery
At 7:26 AM GMT on October 20, AWS identified the root cause as DNS resolution issues affecting regional DynamoDB endpoints.
Once the DNS issue was resolved at 9:24 AM GMT, most services began to recover.
The restoration of DynamoDB immediately restored Scenario’s API operations—requests were successfully processed again, and error rates returned to zero.
Secondary Outage and Extended Impact
However, as AWS continued mitigation efforts, Network Load Balancer (NLB) health checks became impaired, degrading network connectivity across Lambda, DynamoDB, and CloudWatch.
AWS restored NLB health checks by 4:38 PM GMT, gradually reduced throttling on EC2 and Lambda operations, and fully recovered all services by 10:01 PM GMT.
This final chain of events triggered a secondary outage for Scenario between 2:00 PM and 10:10 PM GMT, during which all processing queues were blocked—impacting:
- Prompt-based edits
- Model training
- Image, video, audio, and 3D generation
- Indexing and search
- Compute Unit management
- Notifications
- Socket updates (preventing web app refreshes)
- Email invitations
- SSO and OTP authentication
Additional Contributing Factors
In addition, several of Scenario’s GPU sub-processors, including Replicate and Fal, were also affected throughout the day, contributing to GPU scalability constraints during the incident.
Affected services
Updated
Oct 20 at 03:13am PDT
Inference controlnet-ip-adapter recovered.
Affected services
Updated
Oct 20 at 03:06am PDT
Bulk Asset Retrieval, Inference img2img-texture, Inference controlnet, and 1 other resource recovered.
Affected services
Updated
Oct 20 at 03:05am PDT
Inference txt2img-ip-adapter, Inference txt2img-texture, Texture conversion, and 1 other resource recovered.
Affected services
Updated
Oct 20 at 03:01am PDT
Update Model Examples recovered.
Affected services
Updated
Oct 20 at 03:01am PDT
Update Model Description and Search Models recovered.
Affected services
Updated
Oct 20 at 03:01am PDT
Original Assets Distribution, Create Bulk Assets Download Job, Generate an Inference, and 14 other resources recovered.
Affected services
Updated
Oct 20 at 02:43am PDT
Get Bulk Assets Download Job and Get Asset by ID recovered.
Affected services
Updated
Oct 20 at 02:38am PDT
Web application and Public Content recovered.
Affected services
Updated
Oct 20 at 01:40am PDT
Web application and Public Content went down.
Affected services
Updated
Oct 20 at 12:25am PDT
Create Bulk Assets Download Job, Bulk Asset Retrieval, Inference txt2img-ip-adapter, and 6 other resources went down.
Affected services
Updated
Oct 20 at 12:12am PDT
Inference inpaint-ip-adapter went down.
Affected services
Updated
Oct 19 at 11:59pm PDT
Get Model by ID went down.
Affected services
Updated
Oct 19 at 11:56pm PDT
Inference inpaint went down.
Affected services
Updated
Oct 19 at 11:54pm PDT
Get Model Presets went down.
Affected services
Updated
Oct 19 at 11:54pm PDT
Inference img2img-ip-adapter went down.
Affected services
Updated
Oct 19 at 11:54pm PDT
Inference img2img and Get Model Classes went down.
Affected services
Updated
Oct 19 at 11:54pm PDT
Generate an Inference, Inference controlnet-ip-adapter, Get Recommended Models, and 1 other resource went down.
Affected services
Updated
Oct 19 at 11:53pm PDT
Get All Models went down.
Affected services
Updated
Oct 19 at 11:52pm PDT
Inference controlnet-inpaint went down.
Affected services
Updated
Oct 19 at 11:52pm PDT
Original Assets Distribution, Inference controlnet, and Inference controlnet-img2img went down.
Affected services
Updated
Oct 19 at 11:51pm PDT
Get Asset by ID went down.
Affected services
Updated
Oct 19 at 11:51pm PDT
Get Bulk Assets Download Job went down.
Affected services
Updated
Oct 19 at 11:50pm PDT
Inference img2img-texture and Get Model Description went down.
Affected services
Created
Oct 19 at 11:50pm PDT
Restyle and Get Model Examples went down.
Affected services