Assuage Alert Fatigue Mess with DevOps Intelligence

The problem of alert fatigue is considered to be the #1 pain point for both traditional IT teams as well as modern DevOps engineers. Especially for those who provide operational support for their applications and production infrastructure.

And with increased adoption of cloud and emergence of micro services architecture for building new generation systems, we are quadrupling the amount of metrics monitored, like server metrics, container metrics, app/web/DB server metrics, application metrics. Why? Due to monitoring hell — the need to monitor more things than we used to do in the traditional world. And this problem of alerts hell is only going to increase exponentially.

Undoubtedly, DevOps is maturing and there are a plethora of alert email management tools available. However, that does not solve the alerts overload (especially for non-critical events or events for which no action is required). Hence, engineers are increasingly becoming numb to them. In other words, the crying wolf syndrome,’ steps upon them where in they start ignoring even critical warnings, thinking they are meaningless alerts. Thus, the whole objective of sending alert emails becomes least effective.

To this end, Botmetric analyzed what DevOps and Operational Engineers want in exchange of these alert emails? And few interesting facts unearthed:

  • Ability to understand signal over noise
  • Need for scope aware alerting, to reduce the alerts flood
  • Pressing need for alerts intelligence and event diagnostics over emails
  • Requirement of alert event remediation with workflow handlers

To know further and to delve deep into it, read this post by Botmetric CEO Vijay Rayapati. In this post, Vijay will throw light on what DevOps and Operational Engineers seek in exchange of these alert emails, and how DevOps intelligence can be used to fix alerts hell in the cloud world.