The Tester’s Dilemma: To Pad or Not to Pad

Dick Tracy's Dilemma

My first self-evaluation as a tester included statements like, “I added 50 new tests to this test suite” and “I automated 25 tests in that suite.” I thought the more tests I wrote, the more productive I was. I was wrong, and so are many testers who still feel that way. But it isn’t all our fault.

Early in my career I wrote a ton of tests, each validating one thing and one thing only. The main benefit of this strategy is that if a test failed, I knew exactly what failed with minimal investigation. An unexpected side effect was that it led me to write a lot of tests. And this, I thought, was an accurate indicator of how productive I was.

I later discovered this strategy also had undesirable side effects. I discussed these side-effects in an earlier article to much fanfare, so I won’t go into the details again. But the four disadvantages I see are:

  • Test passes take too long to complete
  • Results take too long to investigate
  • Code takes too much effort to maintain
  • Above a certain threshold, additional tests can mask product bugs

After six years in Test, it’s now obvious to me that it’s not necessarily better to write a lot of tests. I would now rather write one test that finds bugs, than a hundred that don’t. I would rather write one really efficient test that validates a complete scenario, than ten crappy ones that each validate only part of a scenario.

Yet even if it’s better to write fewer, more effective tests, not all testers have the incentive to do so. Are you confident your manager will know you’re working hard and doing a good job if you only have a handful of tests to show for your effort? Not all testers are.

I’m at the point in my career where I’m happy to say I have this confidence because my managers are familiar with the quality of my work. Some less experienced testers, however, face a dilemma: It’s better for their product to have fewer, more efficient tests; but it might be better for their career to write more, less efficient ones.

To be fair, never in my career have I been told that I’m doing a good job because I wrote a lot of tests, or, conversely, doing a bad job because I wrote too few. But sometimes the pressure was less direct.

I worked on one project where at the end of the kick-off meeting I was asked how long it would take to design and automate all of my tests. It was the first day of the project and I had no idea how many tests would be needed, so I asked for time to analyze the functional specifications. I was told we needed to quickly make a schedule and I should give my estimate based on fifty tests.

I had two issues with this question. First, why fifty? I’ll assume it was because fifty sounded like a reasonable number of tests that would help put something in the schedule. The schedule might be changed later, but it would be a good estimate to start with. (In hindsight, it wasn’t a very good estimate, as we actually wrote twice that many tests.)

My bigger problem was that this was a loaded question. I was now under pressure, subtle as it might be, to come up with close to fifty tests. What if I had then analyzed the specs and found that I could test the feature with just five efficient tests? Considering I had given an estimate based on fifty, would this have been viewed as really efficient testing, or really superficial testing?

To solve the tester’s dilemma we need to remove any incentive to pad our test count. We can do this by making sure our teams don’t use “test count” as a quality metric. Our goal should be quality, not quantity; and test count is not a good metric for either test quality or tester quality.

Luckily I’ve never worked on a team that used “test count” as a metric, but I know of teams that do. I also know of teams that use a similar metric: “bug count”. One tester I know spent most of his time developing automation, and yet was “dinged” for not logging as many bugs as the manual testers on the team. Much like “test count”, the number of bugs logged is not as important as the quality of those bugs. We should look at any metric that ends in the word “count” with skepticism.

We also need to keep an eye out for the more subtle forms of pressure to pad our test count. For example, hearing any of the following make me leery:

  • Automate fifty tests.
  • Design ten Build Verification Tests (BVTs).
  • Write a test plan four to five pages long.
  • 10% of your test cases should be classified as P1 (highest-priority).

All of these statements frame the number of tests we’re expected to create. While they’re fine as guidelines, they may also tempt you to add a few extra tests to reach fifty, or classify a couple of P2s as P1. And that can’t be good for your product or your customers.

Advertisements

4 Responses

  1. So exactly what is the dilemma here? The “tester’s dilemma” is whether you should spend energy focused on test count, bug count, or a variety of other metrics, or whether you should spend your time and energy on actually testing. The answer is obvious, right? At least to testers, it is.

    But testers don’t work in a vacuum. There is also a “project manager’s dilemma”. A project manager is interested in questions like “How much will it cost?”, “When will it be done?”, “Is the team on track to make the plan?” A project manager will want to have plans, sizings, and targets, because “if you can’t measure it, you can’t manage it”. Not to denigrate the project managers’ profession, their job is to help make the success of the project predictable so that the business and ultimately your customers know they can count on the project. And predictability is more important than efficiency. So the project manager needs the testers to spend their time doing their work efficiently and effectively – but only after they have spent adequate energy on plans, schedules, and the appropriate metrics. Obviously, getting the right metrics is important. But they will never be perfect.

    So may I be so bold as to re-state the dilemma?

    Given a set of imperfect metrics (and they all are imperfect), would you rather spend your energy padding a little? Or would you rather spend your energy arguing the metrics and explaining why you couldn’t meet the goals?

  2. the belief is that once we have more automation, the engineers can think of other ways to test the product.

  3. I’ve actually felt pressure to NOT have a high test count. I have seen a place that the standard was to have one test case for an entire section of the requirements – one test case that tests several screens, fields and testing types. If one field fails, the whole test case fails. I don’t think that gives an accurate reporting count or testing status. This could apply for too many cases as well and could impact the entire team negatively. I think your cases should provide an accurate picture of testing at any given time so the best decisions for the project can be made.

  4. I replied to a discussion thread Andrew posted on Linked-In before I read this Blog. First I would comment Andrew, you have wisdom beyond the 6 yrs hands on you mention.

    When I read the topic “Have you ever felt pressure to pad your test count or come up with a pre-defined number of tests?” my blood pressure rose and it got me going. I thought that possibly Andrew was caught in this very dilemma. Below is my reply from Linked-In;

    I see no problem if QA eng set their own goals at a certain level to improve themselves, just so long as they are not compromising their work AND as long as they are not pressured by management

    If a company is using test count, or defect count, time per test/defect etc per QA eng as a rating metric …. I would strongly recommend looking for another job. These kind of metrics are shallow and shows senior management does not have a clue. I would also recommend that the QA manager be shown the door as well since they let this happen, and as such it will result in the QA team doing a pitiful job.

    The reason I say this is that the testing environment is complex. It is a grave mistake to think that lower test case / defect counts are any less significant than high numbers for the same. Example; A complex test/defect could take 10 times as long but result in a product that is successful because major and difficult problems were ferreted out of the product, vrs the meaningless (most likely) quick production of tests / defects that mainly find simple issues, will be not very effective, and is actually counter-productive since they will take time away from more meaningful efforts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: