5 Point Guide to AWS DR Automation

Disaster Recovery is a procedure to recover technology infrastructure and systems following a disaster. There are 2 types of disasters:
Natural – These include natural calamities like floods, tornado, earthquakes.
Man-Made –  These are disasters caused by human negligence or errors such as infrastructure failure, IT Bugs, cyber-terrorism.
In such cases, not only should we have backups but backups should be copied across multiple regions and multiple accounts.

Here is a 5-point guide for AWS DR automation:

Type of Backups

There are three major levels of recoveries, organization should consider while designing their recovery solution:

File Level Recovery – from files stored in S3.

Volume Level Recovery – from snapshots.

Database Level Recovery – from DB Snapshots.
For every AWS Infrastructure, there are many kinds of resources that need to be backed up for DR purpose:
      EC2 Instance Backups (EC2 AMIs)
      EBS Volume Backups (Snapshots)
      RDS DB Backups (DBSnapshots)
      Elasticache DB Cluster Backups (Elasticache Snapshots)
      Redshift  DB Cluster Backups (Redshift Snapshots)
      Route53 Hosted Zone Backups (S3 Copy Hosted Zone Files)
      CloudFormation Template Backups (CloudFormation Template)

Critical vs Less Critical vs Non-Critical

Depending on the systems and their potential impacts on the business, we can classify strategies into 3 types –
     Most Critical System – Frequency – 1 hour. Retention -1 year
     Less Critical System – Frequency -1 day.  Retention – 180 days 
     Non-Critical System – Frequency -1 week. Retention – 4 weeks.
                                       – Manually Backup if required.

Automated vs Manual backups

In a dynamic cloud environment, with a wide range of services, it is extremely difficult to manage resources and deal with continuous changes beneath them.
For example:
If an organization has 100’s of instances of different types with different roles to play, it becomes impossible to manually create backups and monitor them.
With Automation, you just need to add tags to every instance defining their
role. It will help to create individual policies based on their role.
Let’s say, you have the following definition of instances –
Tag Instance Count Backup Policy

ENV/DEVELOPMENT 30 Once in a week
ENV/MONITORING 5 Once in a month
ENV/PRODUCTION 60  Every 4 hours
ENV/OTHERS 5 Not required(manually)

In the example shown above, automation is a clear winner relative to a manual backup.

Cost Optimized backups

Organizations should make strategies to clean up old backups which are no longer required. This will drastically reduce AWS Infrastructure Cost.
Also, AWS has a limit on the number of backups that can be created in an account.  For e.g. EBS Snapshot limit is 10,000.

Cost Optimized DR Strategy is therefore required to ensure limited backups.
In Botmetric backups jobs, Snapshots to retain parameter(s) ensures to keep the number of snapshots per volume.
Similarly, AMIs to retain ensures to keep number of AMIs per instance.
Let us understand it with an example – If there are 180 Snapshot to retain, and the job execution is once a day it will keep snapshots of 180 days (i.e. 6 months) old. 

If there are 360 Snapshot to retain and the job execution is twice a day, it will keep a backup of 180 days (i.e. 6 months) old. However, it will keep 2 snapshots per volume of the past 180 days.

Note: For safety purpose we will try to keep Snapshot to retain+1.

DR Automation for various AWS Resource

Depending on the AWS Infrastructure and DR Strategy backups can be taken across regions/across accounts.
In Botmetric, we have a wide variety of jobs for various services-

          Create EC2 Ami based on EC2 Instance tags
          Copy EC2 Ami based on EC2 Instance tags across regions
          Copy EC2 Ami based on EC2 ami tags across regions
          Copy EC2 Ami based on EC2 Instance tags across accounts
          Create EBS snapshot based on ebs volume tags
          Create EBS snapshot based on ec2 instance tags
          Create EBS snapshot based on ec2 instance ids
          Copy EBS snapshot based on ebs volume tags across regions
          Copy EBS snapshot based on ec2 instance tags across regions
          Copy EBS snapshot based on ebs volume tags across accounts

         Create RDS snapshot snapshot based on DB Instance tags
         Copy RDS snapshot based on DB Instance tags across regions

         Create Redshift snapshot based on redshfit cluster tags

         Create Route53 Hosted Zone backups

In addition to it, for cleaning up of old backups, we have de-register Old EC2 AMIs and Delete Old EBS Snapshots jobs.


In today’s ever changing cloud environment, zeal to achieve continuous availability, robustness, scalability and dynamicity spawned the rise of ‘Backup as a Service’ (BaaS).  With AWS DR automation and smart strategies you can secure make your business ‘disaster-free’. Read about the do’s and don’ts of DR Automation strategy.

Botmetric is an intelligent cloud management platform that is designed to make cloud easy for engineers. Sign up now, to see how Botmetric can help you with your Disaster recovery planning.