10 Design Principles for AWS Cloud Architecture
Cloud computing is one of the boons of technology, making storage and access of documents easier and efficient. For it to be reliable, the cloud architecture need to be impeccable. It needs to be reliable, secure, high performing and cost efficient. A good cloud architecture design should take advantage of some of the inherent strengths of cloud computing – elasticity, ability to automate infrastructure management etc. Cloud architecture design needs to be well thought out because it forms the backbone of a vast network. It cannot be arbitrarily designed.
There are certain principles that one needs to follow to make the most of the tremendous capabilities of the Cloud. Here are ten design principles that you must consider while architecting for AWS cloud.
Think Adaptive and Elastic
The architecture of the cloud should be such that it support growth of users, traffic, or data size with no drop in performance. It should also allow for linear scalability when and where an additional resource is added. The system needs to be able to adapt and proportionally serve additional load. Whether the architecture includes vertical scaling, horizontal scaling or both; it is up to the designer, depending on the type of application or data to be stored. But your design should be equipped to take maximum advantage of the virtually unlimited on-demand capacity of cloud computing.
Consider whether your architecture is being built for a short-term purpose, wherein you can implement vertical scaling. Else, you will need to distribute your workload to multiple resources to build internet-scale applications by scaling horizontally. Either way, your architecture should be elastic enough to adapt to the demands of cloud computing.
Also, knowing when to engage stateless applications, stateful applications, stateless components and distributed processing, makes your cloud very effective in its storage.
Treat servers as disposable resources
One of the biggest advantages of cloud computing is that you can treat your servers as disposable resources instead of fixed components. However, resources should always be consistent and tested. One way to enable this is to implement the immutable infrastructure pattern, which enables you to replace the server with one that has the latest configuration instead of updating the old server.
It is important to keep the configuration and coding as an automated and repeatable process, either when deploying resources to new environments or increasing the capacity of the existing system to cope with extra load. Bootstrapping, Golden Images or a Hybrid of the two will help you keep the process automated and repeatable without any human errors.
Bootstrapping can be executed after launching an AWS resource with default configuration. This will let you reuse the same scripts without modifications.
But in comparison, the Golden Image approach results in faster start times and removes dependencies to configuration services or third-party repositories. Certain AWS resource types like Amazon EC2 instances, Amazon RDS DB instances, Amazon Elastic Block Store (Amazon EBS) volumes, etc., can be launched from a golden image.
When suitable, use a combination of the two approaches, where some parts of the configuration get captured in a golden image, while others are configured dynamically through a bootstrapping action.
Not to be limited to the individual resource level, you can apply techniques, practices, and tools from software development to make your whole infrastructure reusable, maintainable, extensible, and testable.
Automate Automate Automate
Unlike traditional IT infrastructure, Cloud enables automation of a number of events, improving both your system’s stability and the efficiency of your organization. Some of the AWS resources you can use to get automated are:
- AWS Elastic Beanstalk: This resource is the fastest and simplest way to get an application up and running on AWS. You can simply upload their application code and the service automatically handles all the details, such as resource provisioning, load balancing, auto scaling, and monitoring.
- Amazon EC2 Auto recovery: You can create an Amazon CloudWatch alarm that monitors an Amazon EC2 instance and automatically recovers it if it becomes impaired. But a word of caution – During instance recovery, the instance is migrated through an instance reboot, and any data that is in-memory is lost.
- Auto Scaling: With Auto Scaling, you can maintain application availability and scale your Amazon EC2 capacity up or down automatically according to conditions you define.
- Amazon CloudWatch Alarms: You can create a CloudWatch alarm that sends an Amazon Simple Notification Service (Amazon SNS) message when a particular metric goes beyond a specified threshold for a specified number of periods.
- Amazon CloudWatch Events: The CloudWatch service delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules that you can set up in a couple of minutes, you can easily route each type of event to one or more targets: AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, etc.
- AWS OpsWorks Lifecycle events: AWS OpsWorks supports continuous configuration through lifecycle events that automatically update your instances’ configuration to adapt to environment changes. These events can be used to trigger Chef recipes on each instance to perform specific configuration tasks.
- AWS Lambda Scheduled events: These events allow you to create a Lambda function and direct AWS Lambda to execute it on a regular schedule.
As an architect for the AWS Cloud, these automation resources are a great advantage to work with.
Implement loose coupling
IT systems should ideally be designed in a way that reduces inter-dependencies. Your components need to be loosely coupled to avoid changes or failure in one of the components from affecting others.
Your infrastructure also needs to have well defined interfaces that allow the various components to interact with each other only through specific, technology-agnostic interfaces. Modifying any underlying operations without affecting other components should be made possible.
In addition, by implementing service discovery, smaller services can be consumed without prior knowledge of their network topology details through loose coupling. This way, new resources can be launched or terminated at any point of time.
Loose coupling between services can also be done through asynchronous integration. It involves one component that generates events and another that consumes them. The two components do not integrate through direct point-to-point interaction, but usually through an intermediate durable storage layer. This approach decouples the two components and introduces additional resiliency. So, for example, if a process that is reading messages from the queue fails, messages can still be added to the queue to be processed when the system recovers.
Lastly, building applications in such a way that they handle component failure in a graceful manner helps you reduce impact on the end users and increase your ability to make progress on your offline procedures.
Focus on services, not servers
A wide variety of underlying technology components are required to develop manage and operate applications. Your architecture should leverage a broad set of compute, storage, database, analytics, application, and deployment services. On AWS, there are two ways to do that. The first is through managed services that include databases, machine learning, analytics, queuing, search, email, notifications, and more. For example, with the Amazon Simple Queue Service (Amazon SQS) you can offload the administrative burden of operating and scaling a highly available messaging cluster, while paying a low price for only what you use. Not only that, Amazon SQS is inherently scalable.
The second way is to reduce the operational complexity of running applications through server-less architectures. It is possible to build both event-driven and synchronous services for mobile, web, analytics, and the Internet of Things (IoT) without managing any server infrastructure.
Database is the base of it all
On AWS, managed database services help remove constraints that come with licensing costs and the ability to support diverse database engines that were a problem with the traditional IT infrastructure. You need to keep in mind that access to the information stored on these databases is the main purpose of cloud computing.
There are three different categories of databases to keep in mind while architecting:
- Relational databases – Data here is normalized into tables and also provided with powerful query language, flexible indexing capabilities, strong integrity controls, and the ability to combine data from multiple tables in a fast and efficient manner. They can be scaled vertically and are highly available during failovers (designed for graceful failures).
- NoSQL databases– These databases trade some of the query and transaction capabilities of relational databases for a more flexible data model that seamlessly scales horizontally. NoSQL databases utilize a variety of data models, including graphs, key-value pairs, and JSON documents. NoSQL databases are widely recognized for ease of development, scalable performance, high availability, and resilience.
- Data warehouse – A specialized type of relational database, optimized for analysis and reporting of large amounts of data. It can be used to combine transactional data from disparate sources making them available for analysis and decision-making.
Be sure to remove single points of failure
A system is highly available when it can withstand the failure of an individual or multiple components (e.g., hard disks, servers, network links etc.). You can think about ways to automate recovery and reduce disruption at every layer of your architecture. This can be done with the following processes:
- Introduce redundancy to remove single points of failure, by having multiple resources for the same task. Redundancy can be implemented in either standby mode (functionality is recovered through failover while the resource remains unavailable) or active mode (requests are distributed to multiple redundant compute resources, and when one of them fails, the rest can simply absorb a larger share of the workload).
- Detection and reaction to failure should both be automated as much as possible.
- It is crucial to have a durable data storage that protects both data availability and integrity. Redundant copies of data can be introduced either through synchronous, asynchronous or Quorum based replication.
- Automated Multi –Data Center resilience is practiced through Availability Zones across data centers that reduce the impact of failures.
Fault isolation improvement can be made to traditional horizontal scaling by sharding (a method of grouping instances into groups called shards, instead of sending the traffic from all users to every node like in the traditional IT structure.)
Optimize for cost
At the end of the day, it often boils down to cost. Your cloud architecture should be designed for cost optimization by keeping in mind the following principles:
- You can reduce cost by selecting the right types, configurations and storage solutions to suit your needs.
Implementing Auto Scaling so that you can scale horizontally when required or scale down when necessary can be done without any extra cost.
- Taking advantage of the variety of Purchasing options (Reserved and spot instances) while buying EC2 instances will help reduce cost on computing capacity.
Applying data caching to multiple layers of an IT architecture can improve application performance and cost efficiency of application. There are two types of caching:
- Application data caching- Information can be stored and retrieved from fast, managed, in-memory caches in the application, which decreases load for the database and increases latency for end users.
- Edge caching – Content is served by infrastructure that is closer to the viewers lowering latency and giving you the high, sustained data transfer rates needed to deliver large popular objects to end users at scale.
- Amazon CloudFront, the content delivery network consisting of multiple edge locations around the world is the edge caching service whereas Amazon ElastiCache makes it easy to deploy, operate and scale in-memory cache in the cloud.
Security is everything! Most of the security tools and techniques used in the traditional IT infrastructure can be used in the cloud as well. AWS is a platform that allows you to formalize the design of security controls in the platform itself. It simplifies system use for administrators and those running IT, and makes your environment much easier to audit in a continuous manner. Some ways to improve security in AWS are:
- Utilize AWS features for Defense in depth – Starting at the network level, you can build a VPC topology that isolates parts of the infrastructure through the use of subnets, security groups, and routing controls.
- AWS operates under a shared security responsibility model, where AWS is responsible for the security of the underlying cloud infrastructure and you are responsible for securing the workloads you deploy in AWS.
- Reduce privileged access to the programmable resources and servers to avoid breach of security. The overuse of guest operating systems and service accounts can breach security.
- Create an AWS Cloud Formation script that captures your security policy and reliably deploys it, allowing you to perform security testing as part of your release cycle, and automatically discover application gaps and drift from your security policy.
- Testing and auditing your environment is key to moving fast while staying safe. On AWS, it is possible to implement continuous monitoring and automation of controls to minimize exposure to security risks. Services like AWS Config, Amazon Inspector, and AWS Trusted Advisor continually monitor for compliance or vulnerabilities giving you a clear overview of which IT resources are in compliance, and which are not.
Now that you know the guidelines and principles to keep in mind while architecting for the AWS Cloud, start designing!
Botmetric is a comprehensive cloud management platform that makes cloud operations, system administrator’s tasks, and DevOps automation a breeze so that cloud management is no more a distraction in a business. Featuring an intelligent automation engine, it provides an overarching set of features that help manage and optimize AWS infrastructure for cost, security & performance.
Headquartered in Santa Clara, CA, Botmetric, today helps Startups to Fortune 500 companies across the globe to save on cloud spend, bring more agility into their businesses and protect the cloud infrastructure from vulnerabilities. To know more about Botmetric, visit https://www.botmetric.com/
Latest posts by Rajeev Kumar (see all)
- Simplify the Cost Incurred By your Enterprise with AWS Cost Savings - July 18, 2017
- A Start-up’s Resolution to Cloud Sprawl Using SaaS-based Cloud Management Platform - July 13, 2017
- 10 Design Principles for AWS Cloud Architecture - June 27, 2017