The Big Fat Blog about AWS Big Data Analytics

According to Gartner, “Big Data is high volume, high velocity, and high variety information assets that require new forms of processing to enable enhanced decision making, insight, discovery, and process optimization”.

Big Data consists of vast volumes of data. It plays a far-reaching role in a business on a daily basis. One of its main use involves analyzing data for insights that pave the path for effective decision-making.

Big Data might be a new name, but it has a long standing history. Gathering and storing immense aggregates of information for eventual analysis is an age old practice. The concept gained popularity at the turn of the 21st century when Doug Laney, an industry analyst introduced the now present elucidation of Big Data as the three V’s:


Industry experts in today’s age have gone a mile further to add two more V’s in the definition of Big Data: VARIABILITY and VALUE.

The five V’s of Big Data make it complex to work with. “Big Data” envelops a vast ocean of data generated from diverse sources such as mobile devices, digital repositories, and enterprise applications. It can range from terabytes to even exabytes. The data can be structured or unstructured. Just consider the following figures to get a tiny glimpse of the giant nature of Big Data:

  • Facebook gets 10 million new photos uploaded every hour
  • Google processes over 24 petaBytes of data every day.
  • Twitter  tweets ~400 million tweets per day.

Big Data’s significance doesn’t revolve around how much data you have, but what you DO with it. Data from any source can be analyzed to find solutions that enable: cost and time reductions; new product development and offering optimization; and intelligent decision making.

When you add Big Data with powerful analytics, the possibilities are endless.

Big Data + powerful analytics

  • Root cause evaluation of failures, issues and defects in near-real time.
  • Coupon generation at the point of sale based on customer buying habits.
  • Recalculation of entire risk portfolio in minutes.
  • Fraudulent behavior detection before damage control.

PAIN POINTS of Big Data in a Traditional Data Center Setup

A traditional data center setup has a lot of gap when it comes to Big Data. Traditional database systems are based on structured data. This traditional data is stored in a fixed format. Examples include Relational Database System (RDBMS) and the spreadsheets. These only give the answers to the questions about what happened. Thus, the problem is not even close to being solved. Unstructured data is the remedy. It fills the gap of structured data. It enhances the ability of an organization, provides deepful insights into the data (including metadata). The role of Big Data is crystal clear – It uses the semi-structured and unstructured data to enhance the data variety gathered from disparate sources like customers, audience and subscribers. The collected data is then transformed into knowledge based information. Data generation is taking place at a lightning speed and traditional database systems are failing to support the demand for heavy data load.

The Solution? AWS CLOUD

The gap of Big Data in a traditional data center setup is easily solvable in a cloud based service such as AWS. You can strainlessly build, secure, and deploy Big Data applications through the broad spectrum of AWS services. Forget procuring hardware and maintaining infrastructure. Instead, focus on ascertaining new insights, solving new problems and building new products. New features are unveiled constantly so you can continue on your journey to leverage the latest technologies without stressing out about making investment commitments in the long game.

The FIVE PILLARS of AWS for Big Data

The five pillars of cost, performance, availability, elasticity and security are the foundation that make it possible for Big Data analytics in AWS to stand tall and high.

The Five Pillars of AWS for Big Data

Cost – Most AWS services have a pay-as-you-go model. This means that you can easily use Big Data services cost effectively without draining your financial resources. Big Data AWS services such as Kinesis Streams, Lambda, EMR, DynamoDB, ES and ML are just a few examples that let you pay only for what you use. Another major benefit of the pay-as-you-go model is that you get the freedom to perform feasibility studies and experiment with multiple algorithms without emptying your pockets!

Performance – By know you should know that Performance is Everything. AWS Big Data services such as Lambda and ML are equipped with high speed performance. In ML, real-time prediction requests return a response within 100ms. Performance of other Big Data services depend on various parameters such as shard throughput capacity in Kinesis Streams, and type of EC2 instances to run clusters in EMR.

Availability – Durability and availability are key factors in the optimal functioning of AWS Big Data services. Here’s a brief look into a few services providing these crucial functionalities – Kinesis Streams provides high availability and data durability by synchronously replicating data across multiple AWS Availability Zones. Lambda uses replication and redundancy to provide high availability. Amazon Redshift automatically detects and replaces a failed node in your data warehouse cluster. Amazon ES domains can be configured for high availability by enabling the Zone Awareness option.

Elasticity –  With scalability and elasticity, you hold the reins to alter the capacities of services according to your needs. The liberty to scale and elasticize are especially essential in handling the humongous nature of Big Data services. In Kinesis Streams, you can increase or decrease the capacity of the stream at any time. Lambda is designed to scale automatically on your command. With Amazon EMR, it is easy to resize a running cluster. DynamoDB is both highly scalable and elastic. There is no advertised limit to the amount of data you can store in a DynamoDB table. In Redshift you can easily change nodes with a few clicks in the console. With ES, you can add or remove instances, and easily modify Amazon EBS volumes to provide room for data growth. Last but not the least, auto scaling is the holy grail that lets you automatically scale EC2 capacity up or down.

Security – BIG Data needs BIG security. Don’t you agree? AWS Big Data services come packaged with building blocks to help organizations structure a secure, flexible, and cost-effective data lake. In Amazon Redshift and EBS, you can enable database encryption for your clusters to help protect data at rest. Moreover, AWS services meet industry-specific compliance standards like PCI DSS, HIPAA, etc. So you can relax and stay rest assured.


Now that you know a bit (or lot!) about Big Data, it’s time to look at individual services that AWS offers for Big Data analytics. Here’s a gist about each of the service and how they can help you in the quest to solve Big Data problems.

Amazon Kinesis Streams (AKS) – The custom app building service to process and analyze data in real time

  • No need for slowing your roll with AKS. Continuously capture and store data from countless sources.
  • Use Kinesis Client Library (KCL) to build Kinesis applications.
  • Stream data to fuel real-time dashboards, create alerts, and execute dynamic pricing and advertising.
  • Escalate data intake by directly inserting data into an AKS.
  • Get real time data analytics, metrics and reporting.

AWS Lambda  –  Effortlessly run codes without provisioning or managing servers.

  • Why pay when your code is not running? With Lambda, you don’t have to.
  • Set up code to automatically trigger (changes in data, shifts in system state, or actions by users) from other AWS services, or call it first-hand from any web or mobile app.
  • Extract, transform, and load data like a piece of cake. Use AKS and Lambda for real-time stream processing for clickstream analysis, log filtering, and social media analysis.
  • Running cron on an EC2 instance can be a costly affair. The better solution is to schedule expressions to run a Lambda function at regular intervals.

Amazon Elastic MapReduce (EMR) – Processing and data storing computing framework

  • Divide and conquer: Reduce large processing problems and data sets into smaller jobs. Simply distribute them across many Hadoop cluster compute nodes. Doing this gives rise to endless capacities, such as:
    • Log processing and analytics
    • Large extract, transform, and load (ETL) data movement  
    • Risk modeling and threat analytics  
    • Ad targeting and click stream analytics
    • Genomics Predictive analytics
    • Ad hoc data mining and analytics

Amazon Machine Learning- Predictive analytics and machine-learning technology made easy.

  • Master the process of creating ML models without learning complex ML algorithms and technology.
  • Discover patterns in your data to create new data point predictions in ML models.
  • Suspicious transactions? Enable applications to flag them.
  • Forecast product demand by inputting historical order information to predict future order quantities.
  • Personalize and Predict. With ML, you can personalize application content and predict user activity.  

Amazon DynamoDB –  NoSQL database service that stores and retrieves limitless data, and serves any level of request traffic.

  • Need a flexible NoSQL database with low read and write latencies for existing or new applications? DynamoDB is here to the rescue!
  • Code changes and downtimes are long gone. Now, you can scale storage and throughput up or down as you want.
  • Wide range of use cases including: Mobile apps, Gaming  Digital ad serving, Live voting, Audience interaction for live events, Sensor networks, and Log ingestion.

Amazon Redshift –  Analyzes data using existing business intelligence tools. (Petabyte-scale data warehouse service)

  • Analyze global sales data for numerous products; ad impressions; social trends; and gaming data.
  • The perfect place to store historical stock trade data.
  • Take health care to the next level. Measure clinical quality, operation efficiency, and financial performance.

Amazon Elasticsearch Service – Deploys, operates, and scales Elasticsearch in the AWS.

  • The optimal service for querying and searching large loads of data.
  • Analyze activity logs and CloudWatch logs.
  • Analyze product usage data coming from various services and systems.
  • Analyze social media sentiments and CRM data.
  • Give your customers a grandiloquent search and navigation experience.  
  • Monitor mobile app usage.

Amazon QuickSight – Visualization builder, ad hoc analytic performer, and business data insight provider.

  • A little bit of SPICE (super-fast, parallel, in-memory, calculation engine) is all you need to perform advanced calculations and render visualizations rapidly.
  • Automatically integrate with AWS data services to scale to thousands of users, and deliver fast and responsive query performance via SPICE’s query engine.
  • Deliver affordable BI functionality to everyone in your organization.

Amazon EC2 – The self-managed Big Data analytics AWS application.

  • Run virtually any Linux or Windows virtualized environment software.
  • You can pay for what you use and nothing more!
  • Sky’s the limit when it comes to self-managed Big Data analytic options. Choose your pick –  NoSQL offering, data warehouse or Hadoop cluster.  
  • Get flexibility and scalability to meet computing needs when running applications.

Big Data has limitless possibilities. Its enormous capabilities are magnified through AWS services. So go ahead: Explore services relevant to your organization and conquer the waters of Big Data.