• Twitter

    Error: Twitter did not respond. Please wait a few minutes and refresh this page.

  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 371 other subscribers
  • Blog with Us!

    If you test at Microsoft, Apple, or Amazon, and are interested in contributing to the Expert Testers blog, please email us.
  • Legal Mumbo Jumbo

    The opinions expressed in this blog are those of the author and do not necessarily state or reflect those of Microsoft. Any reblogging without the expressed written consent of Major League Baseball is just fine with us.
  • Authors

Who Tests the Watchers?

Back in February, in his blog titled “Monitoring your service platform – What and how much to monitor and alert?“,  Prakash discussed the monitoring of a service running in the cloud, the multitude of alerts that could be set up, and how testers need to trigger and test the alerts.  Toward the end he said:

Once it’s successful to simulate [the alerts], these tests results will provide confidence to the operations engineering team that they will be able to handle and manage them quickly and effectively.

When I read this part of his posting, one thing jumped into my Tester’s mind: Who is verifying that there is a working process in place to deal with the alerts once they start coming through from Production?  If you have an Operations team, do they already have a system in place that you can just ‘plug’ your alerts into?  If you don’t have a system in place, are they responsible for developing the system?

If you, the tester, has any doubt about the alert handling process, here are a few questions you might want to ask.  Think of it as a test plan for the process.

  • Is the process documented someplace accessible to everyone it affects?
  • Is there a CRM (Customer Relationship Management) or other tracking system for the alerts?
  • Is a system like SCOM  (System Center Operations Manager) being used that can automatically generate a ticket for the alert, and handle some of the alerts without human intervention?
  • Is there an SLA (Service Level Agreement) or OLA (Operations level Agreement) detailing turn-around time for each level of alert?
  • Who is on the hook for the first level of investigation?
  • What is the escalation path if the Operations team can’t debug the issue?  Is it the original product team?  Is there a separate sustainment team?
  • Who codes, tests, and deploys any code fix?

You might want to take an alert from the test system and inject it into the proposed production system, appropriately marked as a test, of course.  Does the system work as expected?

More information on monitoring and alerts can be found in the Microsoft Operations Framework.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: