When Can Testing End?

Mohawk Stop SignLast year a program manager sent me an email I still think about. The product we were working on was just released to our dogfood environment, where it would be used by employees before being released to customers.

The next day we were still completing the analysis of our automated test failures. Our initial investigation determined that these failures shouldn’t hold up the release; they either didn’t meet the bug bar, or we thought they were caused by configuration issues in our test environment. But as I continued investigating, I discovered one of the failures actually was caused by a product bug that met the bar. Later that day, I found a second product bug. Shortly after logging the second bug I received this email.

When is testing supposed to end?

My instinct was that testing should never end. At the time, our product had only been released to dogfood, not production. There was no reason to stop testing since we could still fix critical bugs before they affected customers. And even if a bug was found after shipping to customers, it’s often possible to release an update to fix the problem.

But the more I thought about it, the more I realized it’s a perfectly legitimate question. Is there ever a time when testing can stop? If you’re shipping a “shrink-wrapped” product, then the answer is “yes”. If your product is a service, the answer is more complicated.

The minimum conditions required to stop testing are:

  1. Development for the feature is complete (no more code churn).
  2. The release criteria for the feature has been met.
  3. All failing tests and code coverage gaps have been investigated.

The first condition is that development has stopped and the code base is stable. As long as product code is changing, there’s a possibility new bugs are being introduced. The new code changes should be tested and regression tests executed.

Once the product code is stable, the release criteria must be met. The release criteria is typically agreed upon early in the test planning stage by both the Test and Dev teams. It can include conditions such as:

  • No open P1 (severe) bugs
  • 100% of tests executed
  • 95% test pass rate
  • 90% code coverage

You might think that once development is complete and your release criteria met, that it’s safe to stop testing; this is not the case. In the example above, the release criteria included a 95% automation pass rate. If your actual pass rate, however, is anything less than 100%, it’s imperative you investigate all failures. If they were caused by a bug that meets the bar, then the bug fix will violate our first condition that the code base remains stable, so testing must continue.

The same holds true for code coverage goals. Unless you have perfect code coverage, you need to address any gaps. This can be done either by creating new test cases, or by accepting the risk that these specific areas of code remain untested.

Once our three conditions are met, you can stop testing–if your product will be shrink-wrapped and shipped on a DVD. This doesn’t mean your product is bug-free; all relatively complex software has some bugs. But that’s why you created your release criteria—to manage the risk inherent in shipping software, not to eliminate it. If your minimum test pass rate or code coverage rate is anything less than 100%, you’re choosing to accept some reasonable amount of risk.

If your product is a service then your job isn’t done yet. You don’t need to continue running existing automation in test environments because the results will always be the same. But even though your product code isn’t changing, there are other factors that can affect your service. A password can expire, or a wrong value could be entered in a configuration file. There could be hardware failures or hacking attempts. The load may be more than you planned for. So even though traditional testing may have ended, you should now shift your efforts to monitoring the production environment and testing in production.

Secrets of SDET Success

This morning, I received an email from a tester in Israel. He buttered me up by telling me that he has followed my blog for a few years (he’s the one…), but then asked a great question about testing at Microsoft. As I was writing a response, I thought that it would be something the readers of this blog may find valuable as well.

[…I’ve appreciated], among other things, the way test engineers are valued in comparison to software engineers in Microsoft. I also witnessed this when I was interviewed at Microsoft, and when I interview testers coming from Microsoft-they are all proud at what they do and who they are.

I can’t relate this directly to results, or lack of them, but would love to have some of this team spirit in the newly test organization I am bringing up. So… what’s your secret?

The Secret

The real secret is that there is no secret. Some of this is shared in How We Test Software at Microsoft, but I’ll try to cover the highlights below.

Hiring

Microsoft is filled with incredibly smart people passionate about making quality software. We look for people like that when we interview and hire, so it’s not a surprise to me that many of our folks come across that way. Most of our testers don’t have any test experience when they come to Microsoft. We hire people who know how to write great code for test positions, but more importantly, we look for people who are great at solving difficult problems – and who know how to use computer programs to do it.

The Job

Our testers are proud of their jobs because (most of them) really, really enjoy their jobs. Testing at Microsoft is extremely challenging – and in turn, extremely rewarding. Here’s a quote I love from a colleague who moved from a very senior development role into test (yes, by choice!).

If you’re looking for really interesting development work then I think you’ll find that designing and writing code that can determine whether another piece of code is ready to ship is a far greater challenge than implementing yet another feature set. From my own experience I can say without a doubt that the most fascinating dev work I have ever done has been in test.

The Future

One big reason I’m still in test, and still enjoy it so much, is that Microsoft has career paths for test, for both managers and non-managers that extend far beyond what many non-MS testers can imagine (we covered the current version of these career profiles in HWTSAM).

I’m currently working on a project to come up with updated descriptions of roles testers may play on teams at Microsoft in roles ranging from entry level to executive level. I’m excited to say, that even after hours scrutinizing every word in the role descriptions (and after 17 years at the company), that the story we’re putting together for testers at Microsoft is pretty cool.

As is often typical for me, I gave the long answer first. The short answer is, hire great people, give them challenging work, and give them a vision for growth. Good luck with your new team!

-Alan

Who Tests the Watchers?

Back in February, in his blog titled “Monitoring your service platform – What and how much to monitor and alert?“,  Prakash discussed the monitoring of a service running in the cloud, the multitude of alerts that could be set up, and how testers need to trigger and test the alerts.  Toward the end he said:

Once it’s successful to simulate [the alerts], these tests results will provide confidence to the operations engineering team that they will be able to handle and manage them quickly and effectively.

When I read this part of his posting, one thing jumped into my Tester’s mind: Who is verifying that there is a working process in place to deal with the alerts once they start coming through from Production?  If you have an Operations team, do they already have a system in place that you can just ‘plug’ your alerts into?  If you don’t have a system in place, are they responsible for developing the system?

If you, the tester, has any doubt about the alert handling process, here are a few questions you might want to ask.  Think of it as a test plan for the process.

  • Is the process documented someplace accessible to everyone it affects?
  • Is there a CRM (Customer Relationship Management) or other tracking system for the alerts?
  • Is a system like SCOM  (System Center Operations Manager) being used that can automatically generate a ticket for the alert, and handle some of the alerts without human intervention?
  • Is there an SLA (Service Level Agreement) or OLA (Operations level Agreement) detailing turn-around time for each level of alert?
  • Who is on the hook for the first level of investigation?
  • What is the escalation path if the Operations team can’t debug the issue?  Is it the original product team?  Is there a separate sustainment team?
  • Who codes, tests, and deploys any code fix?

You might want to take an alert from the test system and inject it into the proposed production system, appropriately marked as a test, of course.  Does the system work as expected?

More information on monitoring and alerts can be found in the Microsoft Operations Framework.

Follow

Get every new post delivered to your Inbox.

Join 293 other followers

%d bloggers like this: