Why Botmetric Chose InfluxDB — A Real-Time Metrics Data Store That Works

Are you an engineer or an architect evaluating or seeking for a real-time data store featuring a simple operational management? If so, Botmetric recommends InfluxDB time series metrics data store. After years of trying out a couple of other data stores, Botmetric zeroed in on InfluxDB. Read on to know why Botmetric chose InfluxDB and what were the key criteria in using it over other data storage systems.

The Backdrop: Why Botmetric Chose InfluxDB Metrics Data Store?

Botmetric, the intelligent cloud management platform built for modern DevOps world, has always been helping cloud customers reduce overall cost, improve their security posture and automate day-to-day operations.

One of the unique differentiations of the Botmetric platform compared to other SaaS tools is the powerful automation framework, wherein DevOps teams can perform automated actions either based on real-time events or scheduled workflows.

To this end, Botmetric execute thousands of jobs every day for its customers to handle their tasks. This is expected to reach millions of tasks as the customer base grows. Further, the metadata around all Botmetric automations should be tracked continually to notify the end users and provide visibility into what’s done and what’s not.

Essentially, Botmetric delivers intelligent operations using the metadata from various operational sources like cloud providers, monitoring tools, logs, etc. It then applies concepts of Algorithmic IT Operations (AIOps) to provide smart insights and adaptive automation, so that the customers can make quick decisions. To that end, Botmetric collects a lot of time series data from different sources and is always in need of an efficient database solution.

Some time during early 2014, Botmetric was using OpenTSDB as a time series database solution. While team Botmetric liked the scalability aspect of OpenTSDB, they faced several challenges in operating it along with Hadoop, HBase, and ZooKeeper. So after 6 months, the team realized that OpenTSDB was not the right fit for a small and nimble team. Another major issue while using OpenTSDB was data aggregation, which was slowing down Botmetric’s development speed. Further, the lack of a reliable failover at HBase in 2014 had caused data availability issues.

In late 2014, team Botmetric decided to move away from OpenTSDB. Consequently, Cassandra & KairosDB was shortlisted as the alternative choice for storing the time series data. The team liked the quick setup and less operational burden compared to OpenTSDB in production. Plus, Cassandra offered with mature client libraries support for easier integration.

Even though Cassandra worked well for Botmetric until early 2016, the team had its share of challenges as the customer base with large data sets grew exponentially and data aggregation was becoming complex task. The Cassandra clusters had to be scaled vertically with high-end machines and horizontally with more nodes. More so, hundreds of millions of records had to be processed everyday into this data store while the team was still doing application level data aggregation on top of Cassandra using CQL. This was a time consuming exercise for most of the engineers in the team.

Further, from late 2015, Botmetric started moving away from metadata around cloud infrastructure, billing and usage records, etc. for easier and faster querying of data. The complete platform was decoupled into microservices-based architecture. To that end, we needed to stream data from our microservices, components usage, and monitoring metrics, etc. Botmetric’s search for reliable time series and real-time data store wasn’t achieved despite using Cassandra and KairosDB for over a year in production.

After several deliberations, during early 2015, Botmetric zeroed in on InfluxDB metrics data store. Botmetric deployed the InfluxData TICK stack with Grafana for monitoring of all the micro-services events. The Botmetric team loved the simplicity, ease of use, support for various client libraries, great aggregation capability for querying, the lack of operational overhead, and more that InfluxDB offered.

With InfluxDB, Team Botmetric was able to easily query data and aggregate it, unlike in Cassandra CQL. Above all, it offered auto expiry support for certain datasets. With this feature, Botmetric is now able reduce its DevOps effort in cleaning up old data using separate utilities.

In the words of Botmetric CEO Vijay Rayapati as cited in one of his blog posts, “InfluxDB is a savior. Its simplicity is amazing and will certainly speed-up your application development time. The simple operational management of InfluxDB will be very helpful if it’s a critical data store for you. You don’t need to break your head during any production debugging. Plus, their active community support is very helpful. We just loved what we saw with the TICK stack deployment for our SaaS platform metrics collection and events monitoring.”

Vijay further adds, “ We’ve now retired our entire KairosDB+Cassandra cluster and replaced it with an InfluxDB, Elasticsearch deployment. Today, InfluxDB and TICK stack are central components in the Botmetric technology landscape. We will continue to adopt it as our core data store as we build new real-time use cases that are event driven in nature.”

The Wrap up

Today, Botmetric refers to InfluxDB as good choice for “High Velocity Real-Time Metrics Data Store.” If you are an engineer or an architect looking for a real-time data store featuring a simple operational management, then your search should end at InfluxDB. You can read the detailed story here, if this case study interests you.

Editor’s Note: This blog post is an adaptation of Vijay Rayapati’s blog post, “Choosing a Real-Time Metrics DataStore That Works – Botmetric Journey.”