Who Tests the Watchers?

Back in February, in his blog titled “Monitoring your service platform – What and how much to monitor and alert?“,  Prakash discussed the monitoring of a service running in the cloud, the multitude of alerts that could be set up, and how testers need to trigger and test the alerts.  Toward the end he said:

Once it’s successful to simulate [the alerts], these tests results will provide confidence to the operations engineering team that they will be able to handle and manage them quickly and effectively.

When I read this part of his posting, one thing jumped into my Tester’s mind: Who is verifying that there is a working process in place to deal with the alerts once they start coming through from Production?  If you have an Operations team, do they already have a system in place that you can just ‘plug’ your alerts into?  If you don’t have a system in place, are they responsible for developing the system?

If you, the tester, has any doubt about the alert handling process, here are a few questions you might want to ask.  Think of it as a test plan for the process.

  • Is the process documented someplace accessible to everyone it affects?
  • Is there a CRM (Customer Relationship Management) or other tracking system for the alerts?
  • Is a system like SCOM  (System Center Operations Manager) being used that can automatically generate a ticket for the alert, and handle some of the alerts without human intervention?
  • Is there an SLA (Service Level Agreement) or OLA (Operations level Agreement) detailing turn-around time for each level of alert?
  • Who is on the hook for the first level of investigation?
  • What is the escalation path if the Operations team can’t debug the issue?  Is it the original product team?  Is there a separate sustainment team?
  • Who codes, tests, and deploys any code fix?

You might want to take an alert from the test system and inject it into the proposed production system, appropriately marked as a test, of course.  Does the system work as expected?

More information on monitoring and alerts can be found in the Microsoft Operations Framework.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: