Join 345 other followers

• ## Blog with Us!

If you test at Microsoft, Apple, or Amazon, and are interested in contributing to the Expert Testers blog, please email us.
• ## Legal Mumbo Jumbo

The opinions expressed in this blog are those of the author and do not necessarily state or reflect those of Microsoft. Any reblogging without the expressed written consent of Major League Baseball is just fine with us.

## Look to the Data

I was recently asked to investigate my team’s automated daily test cases. It was taking more than 24 hours to execute our “daily” test run. My job was to find why the tests were taking so long to complete, and speed them up when possible. In the process, an important lesson was reinforced: we should look to real, live customer data to guide our test planning.

I had several ideas to speed up our test passes. My highest priority was to find and address the test cases that took the longest to run. I sorted the tests by run-time, and discovered the slowest test case took over an hour to run. I tackled this first, as it would give me the biggest “bang for the buck”.

The test validated a function that took two input parameters. It iterated through each possible input combination, verifying the result. The code looked like this:

for setting1 = 1 to 5
{
for setting2 = 0 to 5
{
validate(setting1, setting2);
}
}

The validate function took two minutes to execute. Since it was called thirty times, the test case took an hour to complete. I first tried to improve the performance of the validate function, but the best I could do was shave a few seconds off its run-time.

My next thought was whether we really needed to test all 30 input combinations. I requested access to the live production data for these fields. I found just two setting combinations accounted for 97% of the two million production scenarios; four combinations accounted for almost 99%. Many of the combinations we were testing never occurred at all “in the real world.”

#### Most Common Data Combinations

 setting1 setting2 % of Production Data 5 5 73% 1 5 24% 1 0 .9% 2 1 .8% 2 5 .7% 3 5 .21% 1 1 .2% 3 0 .1% 4 0 .09%

I reasoned that whatever change I made to this test case, the two most common combinations must always be tested. Executing just these two would test almost all the production scenarios, and the run-time would be cut to only four minutes. But notice that the top two combinations both have a setting2 value of 5. Validating just these two combinations would leave a huge hole in our test coverage.

I considered testing only the nine combinations found in production. This would guarantee we tested all the two million production scenarios, and our run-time would be cut from one hour to 18 minutes. The problem with this solution is that if a new combination occurred in production in the future, we wouldn’t have a test case for it; and if there was a bug, we wouldn’t know about it until it was too late.

Another possible strategy would be to split this test into multiple test cases, and use priorities to run the more common scenarios more often. For example, the two most common scenarios could be classified as P1, the next seven P2, and the rest P3. We would then configure our test passes to execute the P1s daily, the P2s weekly, and the P3s monthly.

The solution I decided on, however, was to keep it as one test case and always validate the two most common combinations as well as three other, randomly generated, combinations. This solution guaranteed we test at least 97% of the live production scenarios each night. All thirty combinations would be tested, on average, every 10 days–more often than the “priority” strategy I had considered. The solution reduced the test’s run-time from over an hour to about 10 minutes. The entire project was a success as well; we reduced the test pass run-time from 27 hours to less than 8 hours.

You may be thinking, I could have used pairwise testing to reduce the number of combinations I executed. Unfortunately, pairwise doesn’t help when you only have two input parameters. This strategy ensures each pair of parameters are validated, so the total number of combinations tested would have remained at thirty.

In hindsight, I think I could have been smarter about the random combinations I tested. For example, I could have ensured I generated at least one test case for each possible value of setting1 and setting2.

I also could have associated a “weight” with each combination based on how often it occurred in production. Settings would still be randomly chosen, but the most common ones would have a better chance of being generated. I would just have to be careful to assign some weight to those combinations that never appear in production; this would make sure all combinations are eventually tested. I think I’ll use this strategy the next time I run into a similar situation.