When it comes to monitoring metrics, quality is key
Humans are hard-wired to amass more things: money, physical goods, friends, etc. Many people focus on the quantity of things they own and relationships they build versus the quality – but more isn’t always better. This principle is illustrated in IT monitoring, too.
Monitoring isn’t a game of who has the most points – who can monitor the most, who can quantify the most alerts, etc. Rather, successful IT teams focus on the quality of the metrics. If someone says their solution has more than 600 metrics that can monitor a particular application or component, what does that actually mean? How many of those metrics are relevant, and frankly, who cares about them? Does an IT administrator really care about seeing the total CPU seconds used by a server since it was last rebooted? Probably not – but he would care about a rate, such as number of CPU milliseconds used over the past minute or a percentage.
There are many monitoring solutions that collect, graph, provide alerts and report meaningless metrics that don’t offer value, just for the sake of adding more metrics to the benefits list. Wouldn’t you rather see a product that tracks and reports only metrics that provide crucial data to IT administrators, managers and the business, than have a solution with a myriad of metrics that you have to wade through to find the value (if any exists)? Understanding why a metric is important, and knowing what action should take place if the metric exceeds a certain threshold, is what matters. Without this purpose, there is no sense collecting or reporting on metrics.
Metrics may appear basic to a domain expert, but in many cases a Level 1 or 2 IT operations staff member may be the first person to view an alert. If he or she understand the details about the metric, and potential actions to take to address the underlying cause, it will speed up his or her assessment and help determine the need to escalate a discovery. For example, knowing if a virtual machine’s (VM) CPU utilization is high is both important and actionable in many cases, but taking it a step further and identifying the top 10 VMs based on CPU utilization is even more valuable and powerful.
We have debuted a SCOM Management Pack (MP) for Nutanix that provides a dashboard view. In this view, you can select a cluster, and it will show you the top VMs by CPU utilization, IO latency, memory and a variety of other metrics. The MP has additional dashboards that have similar visibility, allowing users to see information about the cluster, storage pools, etc. and the associated metrics so they can make higher quality, more informed decisions.
It’s not a numbers game competing over meaningless metrics, but rather using crucial data to protect your entire stack.
Do you have five nines availability for your mission critical apps? Find out how our new MP for Nutanix can help.