On February 28th, 2017, several companies reported Amazon AWS S3 Cloud Storage Outage. Within minutes, hundreds and thousands of Twitter posts started making rounds across the globe sharing their experiences how their apps went down due to this outage.
Image Source: https://twitter.com/TechCrunch
The issue that kicked off around 9:44 Pacific Time (17:44 UTC) on 28th Feb 2017, was reported primarily due to an error in simple storage buckets hosted in the us-east-1 region (as tweeted by AWS).
Image Source: https://twitter.com/awscloud?lang=en
This AWS S3 outage led to major websites and services, such as Medium, Docker Registry Hub, Asana, Quora, Runkeeper, Trello, Yahoo Mail, etc. falling offline, losing images, or left running haphazardly.
According to sources, this AWS S3 outage disrupted half the internet. Because, Amazon S3 is used by approximately 148,213 websites, and 121,761 unique domains, according to SimilarTech.
Even though AWS fixed it after juggling for several hours, its impact was huge.
Image Source: https://status.aws.amazon.com/
What caused it?
According to Nick Kephart, senior director of product marketing for San Francisco-based network intelligence company ThousandEyes, who monitored this situation closely said that during the outage information could get into Amazon’s overall network, but attempting to establish a network connection with the S3 servers was not possible. Due to this, all traffic was dead in its tracks. Hence, all the sites and apps that hosted data, images, or other information on S3 were affected.
As a company, how could you avoid this situation or fool-proof your system against such outages in the future? Here’s how:
Use Region Level DR Rather Than AZ Level DR
People who did not go for Region level DR and opted for AZ level DR in their session were affected. When you opt for Region level DR, you will have to copy data from one S3 Region to another S3 Region and sync that S3 data between Regions. This helps with availability of a backup, hence helping you cope with such outages in future.
Opt for Cross-Region Replication
Cross region replication helps to make it easier for you to make copies of your AMIs/snapshots to another in a second AWS region so that you can always keep a system running in another region. Or, so to say, run the stand-by environment in another region.
Go for the AWS CloudFormation Template
Use CloudFormation template to create a cluster in another region, so that you don’t have to wait for a longer duration to spring back to normal. It will not take more than 2 hours for you to bring the cluster and whole environment back on track with CloudFormation.
Bottom line: No Technology is Perfect
All technologies might fail at some point. A Plan B in place is the best way forward.
Even though large swaths of the internet went down due to this Amazon AWS S3 Cloud Storage Outage, several sites and apps were not affected by it. Reason: They had their data spread across multiple regions.
If you know of any other way to mitigate such outages, do share it with us in the comment section below or tweet us @BotmetricHQ.
Latest posts by Editor (see all)
- May Roundup @ Botmetric: Deeper AWS Cost Analysis and Continuous Security - May 31, 2017
- What is NoOps, Is it Agile Ops? - May 25, 2017
- Why Botmetric Chose InfluxDB — A Real-Time Metrics Data Store That Works - May 18, 2017