Look to the Data

I was recently asked to investigate my team’s automated daily test cases. It was taking more than 24 hours to execute our “daily” test run. My job was to find why the tests were taking so long to complete, and speed them up when possible. In the process, an important lesson was reinforced: we should look to real, live customer data to guide our test planning.

I had several ideas to speed up our test passes. My highest priority was to find and address the test cases that took the longest to run. I sorted the tests by run-time, and discovered the slowest test case took over an hour to run. I tackled this first, as it would give me the biggest “bang for the buck”.

The test validated a function that took two input parameters. It iterated through each possible input combination, verifying the result. The code looked like this:

for setting1 = 1 to 5
  for setting2 = 0 to 5
    validate(setting1, setting2);

The validate function took two minutes to execute. Since it was called thirty times, the test case took an hour to complete. I first tried to improve the performance of the validate function, but the best I could do was shave a few seconds off its run-time.

My next thought was whether we really needed to test all 30 input combinations. I requested access to the live production data for these fields. I found just two setting combinations accounted for 97% of the two million production scenarios; four combinations accounted for almost 99%. Many of the combinations we were testing never occurred at all “in the real world.”

Most Common Data Combinations

setting1 setting2 % of Production Data
5 5 73%
1 5 24%
1 0 .9%
2 1 .8%
2 5 .7%
3 5 .21%
1 1 .2%
3 0 .1%
4 0 .09%

I reasoned that whatever change I made to this test case, the two most common combinations must always be tested. Executing just these two would test almost all the production scenarios, and the run-time would be cut to only four minutes. But notice that the top two combinations both have a setting2 value of 5. Validating just these two combinations would leave a huge hole in our test coverage.

I considered testing only the nine combinations found in production. This would guarantee we tested all the two million production scenarios, and our run-time would be cut from one hour to 18 minutes. The problem with this solution is that if a new combination occurred in production in the future, we wouldn’t have a test case for it; and if there was a bug, we wouldn’t know about it until it was too late.

Another possible strategy would be to split this test into multiple test cases, and use priorities to run the more common scenarios more often. For example, the two most common scenarios could be classified as P1, the next seven P2, and the rest P3. We would then configure our test passes to execute the P1s daily, the P2s weekly, and the P3s monthly.

The solution I decided on, however, was to keep it as one test case and always validate the two most common combinations as well as three other, randomly generated, combinations. This solution guaranteed we test at least 97% of the live production scenarios each night. All thirty combinations would be tested, on average, every 10 days–more often than the “priority” strategy I had considered. The solution reduced the test’s run-time from over an hour to about 10 minutes. The entire project was a success as well; we reduced the test pass run-time from 27 hours to less than 8 hours.

You may be thinking, I could have used pairwise testing to reduce the number of combinations I executed. Unfortunately, pairwise doesn’t help when you only have two input parameters. This strategy ensures each pair of parameters are validated, so the total number of combinations tested would have remained at thirty.

In hindsight, I think I could have been smarter about the random combinations I tested. For example, I could have ensured I generated at least one test case for each possible value of setting1 and setting2.

I also could have associated a “weight” with each combination based on how often it occurred in production. Settings would still be randomly chosen, but the most common ones would have a better chance of being generated. I would just have to be careful to assign some weight to those combinations that never appear in production; this would make sure all combinations are eventually tested. I think I’ll use this strategy the next time I run into a similar situation.

Death by a Thousand Little Bugs

Software bug

Minor product defects that take only a few minutes to resolve are often never fixed; it seems there are always more important tasks to work on. If this sounds familiar, your test team may suffer from morale issues. And your product may suffer from “death by a thousand little bugs”. Fortunately, these problems can be fixed as easily as these bugs can.

Once testers get their hands on a feature, it doesn’t take long for low-priority defects to pile up in their bug-tracking database. These may include, for example, minor UI issues such as missing punctuation, inconsistent fonts, or grammar errors. These bugs tend to pile up because they are primarily cosmetic. Testers resolve the highest-priority bugs first–often rightly so. We should fix bugs that greatly affect functionality, performance, or security before fixing a spelling typo in the UI.

What can happen, however, is that we never fix many of these low-priority bugs. There are often more critical defects being discovered, so we continuously postpone the low-priority ones.

Unfortunately, some of the bugs left behind are those that were logged the earliest. There are few things I find more frustrating than reporting a simple bug that doesn’t get fixed. My typical complaint sounds something like this: “Why hasn’t this bug been fixed? I logged it weeks ago. It’s a one-line change that will take only two minutes to fix!”

A previous project I worked on provides a perfect example. Not long after I was given the first working build of the UI, I logged two minor bugs. One issue was logged because two buttons on the same page were not aligned properly. The other bug was simply that a sentence ended with an extra period. When the product was released more than four months later, the misaligned buttons and the extra period were still there.

Another problem is that even if these low-impact bugs don’t affect functionality, they can greatly affect the customer’s perception of the product. How can a customer fully trust a product, no matter how well it actually works, if there are mountains of minor defects? This is the “death by a thousand little bugs” syndrome.

Before I came to Microsoft, I ran an online store. One night I modified the shopping cart page, and the next day sales plummeted. When I reviewed the changes I had made, I realized that I misspelled two words and added a broken image link. I fixed these issues and sales quickly went back to normal.

The functionality of the page hadn’t changed at all. But potential customers saw the “minor” errors and assumed the entire shopping cart had poor quality. They certainly didn’t rationalize, “They must have spent all their effort making sure the functionality was solid. That’s why they postponed these obvious, but low-priority bugs.”

The “death by a thousand little bugs” syndrome exists because most teams evaluate each bug individually–and individually, each of these bugs is trivial; but in the aggregate, they are not. Collectively, they make users skeptical of your product.

The solution is that we shouldn’t always address high-priority bugs before low-priority bugs. But when do we make the exceptions? Here are three strategies that I think could help solve these problems.

  1. Set aside one day each month for developers to address the low-priority, low-hanging-fruit bugs. This is a great way to fix a lot of bugs in a short amount of time. It can also prevent your product from suffering from “death by a thousand little bugs.”
  2. Put aside one day every month to fix the defects that have been in the bug database the longest–regardless of priority. This helps prevent testers from becoming demoralized because bugs they logged months ago still haven’t been fixed.
  3. Once a month, increase the priority of all bugs that are least 30 days old. Developers can continue to pull bugs out of the queue in priority order, but the difference is that after one month, a bug that was logged as P4 (lowest priority) becomes a P3. After three months, it becomes a high-priority P1 bug. It may initially sound odd that low-priority defects, such as a misspelled word in a log file, will eventually be classified as highest priority. But doing so forces some action to be taken on the bug. As a P1, it now must either be fixed or closed by the Programmer Manager as “Won’t Fix”.

You may be thinking, “but I’m a tester, and these solutions have nothing to do with testers.” When I started in Test, that’s how I thought. I now realize that my primary responsibility is to make my product better, not just to log bugs. If these strategies would work well for your team, then you should lobby for them–they may even increase your own morale along the way.

Do you think any of these strategies work well for your team? What strategies have you tried in the past, and how have they worked? I’m very interested in hearing your comments.

%d bloggers like this: