The Tester’s Dilemma: To Pad or Not to Pad

Dick Tracy's Dilemma

My first self-evaluation as a tester included statements like, “I added 50 new tests to this test suite” and “I automated 25 tests in that suite.” I thought the more tests I wrote, the more productive I was. I was wrong, and so are many testers who still feel that way. But it isn’t all our fault.

Early in my career I wrote a ton of tests, each validating one thing and one thing only. The main benefit of this strategy is that if a test failed, I knew exactly what failed with minimal investigation. An unexpected side effect was that it led me to write a lot of tests. And this, I thought, was an accurate indicator of how productive I was.

I later discovered this strategy also had undesirable side effects. I discussed these side-effects in an earlier article to much fanfare, so I won’t go into the details again. But the four disadvantages I see are:

  • Test passes take too long to complete
  • Results take too long to investigate
  • Code takes too much effort to maintain
  • Above a certain threshold, additional tests can mask product bugs

After six years in Test, it’s now obvious to me that it’s not necessarily better to write a lot of tests. I would now rather write one test that finds bugs, than a hundred that don’t. I would rather write one really efficient test that validates a complete scenario, than ten crappy ones that each validate only part of a scenario.

Yet even if it’s better to write fewer, more effective tests, not all testers have the incentive to do so. Are you confident your manager will know you’re working hard and doing a good job if you only have a handful of tests to show for your effort? Not all testers are.

I’m at the point in my career where I’m happy to say I have this confidence because my managers are familiar with the quality of my work. Some less experienced testers, however, face a dilemma: It’s better for their product to have fewer, more efficient tests; but it might be better for their career to write more, less efficient ones.

To be fair, never in my career have I been told that I’m doing a good job because I wrote a lot of tests, or, conversely, doing a bad job because I wrote too few. But sometimes the pressure was less direct.

I worked on one project where at the end of the kick-off meeting I was asked how long it would take to design and automate all of my tests. It was the first day of the project and I had no idea how many tests would be needed, so I asked for time to analyze the functional specifications. I was told we needed to quickly make a schedule and I should give my estimate based on fifty tests.

I had two issues with this question. First, why fifty? I’ll assume it was because fifty sounded like a reasonable number of tests that would help put something in the schedule. The schedule might be changed later, but it would be a good estimate to start with. (In hindsight, it wasn’t a very good estimate, as we actually wrote twice that many tests.)

My bigger problem was that this was a loaded question. I was now under pressure, subtle as it might be, to come up with close to fifty tests. What if I had then analyzed the specs and found that I could test the feature with just five efficient tests? Considering I had given an estimate based on fifty, would this have been viewed as really efficient testing, or really superficial testing?

To solve the tester’s dilemma we need to remove any incentive to pad our test count. We can do this by making sure our teams don’t use “test count” as a quality metric. Our goal should be quality, not quantity; and test count is not a good metric for either test quality or tester quality.

Luckily I’ve never worked on a team that used “test count” as a metric, but I know of teams that do. I also know of teams that use a similar metric: “bug count”. One tester I know spent most of his time developing automation, and yet was “dinged” for not logging as many bugs as the manual testers on the team. Much like “test count”, the number of bugs logged is not as important as the quality of those bugs. We should look at any metric that ends in the word “count” with skepticism.

We also need to keep an eye out for the more subtle forms of pressure to pad our test count. For example, hearing any of the following make me leery:

  • Automate fifty tests.
  • Design ten Build Verification Tests (BVTs).
  • Write a test plan four to five pages long.
  • 10% of your test cases should be classified as P1 (highest-priority).

All of these statements frame the number of tests we’re expected to create. While they’re fine as guidelines, they may also tempt you to add a few extra tests to reach fifty, or classify a couple of P2s as P1. And that can’t be good for your product or your customers.

The Case for Fewer Test Cases

Robotium Remote Testing

Testers are often encouraged to automate more and more test cases. At first glance, the case for more test cases makes sense—the more tests you have, the better your product is tested. Who can argue with that? I can.

Creating too many test cases leads to the condition known as “test case bloat”. This occurs when you have so many test cases that you spend a disproportionate amount of time executing, investigating, and maintaining these tests. This leaves little time for more important tasks, such as actually finding and resolving product issues. Test case bloat causes the following four problems:

1. Test passes take a long time to complete.

The longer it takes for your test pass to complete, the longer you have to wait before you can begin investigating the failures. I worked on one project where there were so many test cases, the daily test pass took 27 hours to finish. It’s hard to run a test pass every day when it takes more than 24 hours to complete.

2. Failure investigations take a long time to complete.

The more tests you have, the more failures you have to investigate. If your test pass takes a day to complete, and you have a mountain of failures to investigate, it could be two days or longer before a build is validated. This turn-around time may be tolerable if you’re shipping your product on a DVD. But when your software is a service, you may need to validate product changes a lot faster.

For example, the product I’m working on is an email service. If a customer is without email, it’s unacceptable for my team to take this long to validate a bug fix. Executing just the highest-priority tests to validate a hot-fix may be a valid compromise. If you have a lot of test cases, however, even this can take too long.

3. Tests take too much effort to maintain.

When your automation suffers from test case bloat, even subtle changes in product functionality can cause massive ripples in your existing test cases, drastically increasing the amount of time you spend maintaining them. This leaves little time for other, more valuable tasks, such as testing new features. It’s also a morale killer. Most testers I know— the really good ones, at least— don’t want to continually maintain the same test cases. They want to test new features and write new code.

4. After a certain threshold, more test cases no longer uncover product bugsthey mask them.

Most test cases only provide new information the first time they’re run. If the test passes, we can assume the feature works. If the test fails, we file a bug, which is eventually fixed by development, and the test case begins to pass. If it’s written well, the test will continue to pass unless a regression occurs.

Let’s assume we have 25 test cases that happily pass every time they’re run. At 3:00 a.m. an overtired developer then checks in a bug causing three tests to fail. Our pass rate would drop from 100% to an alarming 88%. The failures would be quickly investigated, and the perpetrator would be caught. Perhaps we would playfully mock him and make him wear a silly hat.

But what if we had 50 test cases? Three failures out of 50 test cases is a respectable 94% pass rate. What about a hundred or two hundred tests? With this many tests, it’s now very possible that there are some amount of failures in every pass simply due to test code problems; timing issues are a common culprit. The same three failures in two hundred tests is a 99% pass rate. But were these failures caused by expected timing issues, or a real product bug? If your team was pressed to get a hot-fix out the door to fix a live production issue, it may not investigate a 99% pass rate with as much vigor as an 88% pass rate.

Bloat Relief

If your automation suffers from test case bloat, you may be able to refactor your tests. But you can’t simply mash four or five tests with different validation points into a single test case. The more complicated a test, the more difficult it becomes to determine the cause and severity of failure.

You can, however, combine test cases when your validation points are similar, and the severity of a failure at each validation point is the same. For example, if you’re testing a UI dialog, you don’t need 50 different test cases to validate that 50 objects on the screen are all at their expected location. This can be done in one test.

You can also combine tests when you’re checking a single validation point, such as a database field, with different input combinations. Don’t create 50 different test cases that check the same field for 50 different data combinations. Create a single test case that loops through all combinations, validating the results.

When my test pass was taking 27 hours to complete, one solution we discussed was splitting the pass based on priority, feature, or some other criteria. If we had split it into three separate test passes, each would have taken only nine hours to finish. But this would have required three times as many servers. That may not be an issue if your test pass runs on a single server or virtual machines, however I’ve worked on automation that required more than twenty physical servers–tripling your server count is not always an option.

In addition to the techniques discussed above, pair-wise testing and equivalence class partitioning are tools that all testers should have in their arsenal. The ideal solution, however, is to prevent bloating before it even starts. When designing your test cases, it’s important to be aware of the number of tests you’re writing. If all else fails, I hear you can gain time by investigating your test failures while travelling at the speed of light.

%d bloggers like this: