Equivalence Class Testing

When testing even a simple application, it’s often impossible, or at least impractical, to test with all possible inputs. This is true whether you’re testing manually or using automation; throwing computing power at the problem doesn’t usually solve it. The question is, how do you find those inputs that give the most coverage in the fewest tests.

Assume we have a UI with just two fields for creating a user account. User Name can be between 1 and 25 characters, but can’t contain a space. Sex can be either Male or Female.

To test all possible inputs, we would first test single-character names using all possible characters. Then test two-character names using all possible characters in the first position, along with all possible characters in the second position. This process would be repeated for all lengths between 1 and 25. We then need to repeat all these names for Males and Females. I’m no mathematician, but this works out to be close to a kagillion possible inputs.

If this simple UI can have so many inputs, then it’s clear we’ll usually have to test with a very small subset of all possible data. Equivalence classes (ECs) give us the guidance we need to select these subsets.

ECs are subsets of input data, where testing any one value in the set should be the same as testing the others. Here we have two equivalence classes of valid input data.

Field Description
1. User Name Length >=1 and <=25 not containing a space
2. Sex Male or Female

Since we have more than one EC of valid data, we can create a single test that randomly picks one value from each. After all, testing with A/Male should give us the same result as testing with AB/Female or JKHFGKJkjgjkhg/Male. In all three cases, a new user account should be created.

It’s also important to create classes of invalid input. If you have an EC of valid data in the range of 1-10, you should also an EC of invalid data less than 1 and an EC of values greater than 10. We can create five equivalence classes of invalid input for our sample app.

Field Description
3. User Name Empty
4. User Name Length > 25 (no spaces)
5. User Name Length > 25 (containing a space)
6. User Name Length >1 and <=25 containing a space
7. Sex Neither Male nor Female

When designing your tests, it’s important to only pick one invalid value per test case; using more than one can lead to missed product bugs. Suppose you choose a test with both an invalid User Name and an invalid Sex. The product may throw an error when it detects the invalid User Name, and never even get to the logic that validates the Sex. If there are bugs in the Sex-validation logic, your test won’t catch it.

Using our ECs, we’d create the following tests.

  1. Valid name from #1 and valid Sex from #2
  2. Invalid name from #3 and valid Sex from #2
  3. Invalid name from #4 and valid Sex  from #2
  4. Invalid name from #5 and valid Sex from #2
  5. Invalid name from #6 and valid Sex from #2
  6. Valid name from #1 and invalid Sex from #7

Equivalence classes have reduced our number of tests from a kagillion to six. As Adam Sandler might say, “Not too shabby.”

Not only does EC testing make sure you don’t have too many tests, it also ensures you don’t have too few. Testers not familiar with equivalence classes might test this UI with just four pairs of hardcoded values:

  1. John/Male
  2. Jane/Female
  3. John Smith/Male
  4. Joan/Blah

When these tests are automated and running in your test pass, they validate the same values day after day. Using equivalence class testing with random values, we validate different input every day, increasing our odds of finding bugs.

The day I wrote this article I found a UI bug that had been in the product for several months. The UI didn’t correctly handle Display Names of more than 64 characters. Sure enough, the UI tests didn’t use ECs, and tested with the same hardcoded values every day. If they had chosen a random value from an EC, the test would have eventually found this bug.

Another benefit of equivalence class testing is that it leads us Boundary value testing. When an equivalence class is a range, we should create tests at the high and low edge of the range, and just outside the range. This helps us find bugs in cases when the developer may have used a > instead of >=.

Boundary tests, however, should not take the place of your EC tests. In our example, don’t hardcode your EC test to have a User Name length of 25. Your EC tests should choose a random number between 1 and 25. You should have a separate Boundary tests that validate length 25.

And don’t forget to create tests for values far outside the range. The size of your variable is a type of EC class as well! This could lead to other bugs such as exceeding the max length of a string, or trying to shove a long number into an int.

When defining your ECs, the first place to look is the Feature Specification Document (or whatever your Agile equivalent may be.)  The other place is in the product code. This, however, can be risky. If the source code has a statement like if x> 1 && x<5, you would create your EC as 2,3,4, and your tests will pass. But how do you know the source code was correct? Maybe it was intended to be x>=1 && x<=5. That’s why you should always push for valid input rules to be defined in a spec.

Another technique for creating ECs is to is to break up the output into equivalence classes. Then choose inputs that give you an output from each class. In our example, assume the user account can be stored in one of two databases depending on whether the User Name contains Chinese characters. In this case, we would need one EC of valid Chinese User Names and one of valid non-Chinese User Names.

PS – Not only is this article a (hopefully) interesting lesson on equivalence classes, it’s also an interesting experiment for the Expert Testers blog. In this article, I used the word Sex 15 times. I’m curious if it gets more traffic than usual because of it. If it does, you can look forward to my next article, “The Naked Truth – Pairwise Testing Exposed!”

State-Transition Testing

One of our goals at Expert Testers is to discuss practical topics that can help every tester do their job better. To this end, my last two articles have been about Decision Table Testing and Being an Effective Spec Reviewer. Admittedly, neither of these topics break new ground. That doesn’t mean, however, most testers have mastered these techniques. In fact, almost 50% of the respondents to our Decision Table poll said they’ve never used one.

Continuing the theme of discussing practical topics, let’s talk about State Transition Diagrams. State Transition Diagrams, or STDs as they’re affectionately called, are effective for documenting functionality and designing test cases. They should be in every testers bag of tricks, along with Decision Tables, Pair-Wise analysis, and acting annoyed at work to appear busy.

STDs show the state a system will move to, based on its current state and other inputs. These words, I understand, mean little until you’ve seen one in action, so let’s get to an example. Since I’m particularly busy (i.e., lazy) today, I’ll use a simple example I found on the web.

Below is a Hotel Reservation STD. Each rectangle, or node, represents the state of the reservation. Each arrow is a transition from one state to the next. The text above the line is the input–the event that caused the state to change. The text below the line is the output–the action the system performs in response to the event.

clip_image001

Pasted from <http://users.csc.calpoly.edu/~jdalbey/SWE/Design/STDexamples.html>

One benefit of State Transition Diagrams is that they describe the behavior of the system in a complete, yet easy-to-read and compact, way. Imagine describing this functionality in sentence format; it would take pages of text to describe it fully. STDs are much simpler to read and understand. For this reason, they can show paths that were missed by PM or Developer, or paths the Tester forgot to test.

I learned this when I was testing Microsoft Forefront Protection for Exchange Server, a product that protects email customers from malware and spam. The product logic for determining when a message would be scanned was complicated; it depended on the server role, several Forefront settings, and whether the message was previously scanned.

The feature spec described this logic in sentence format, and was nearly impossible to follow. I took it upon myself to create a State Transition Diagram to model the logic. I printed it out and stuck it on my office (i.e., cubicle) wall. Not a week went by without a Dev, Tester, or PM stopping by to figure out why their mail wasn’t being scanned as they expected.

If you read my article on Decision Tables (DTs), and I’m sure you didn’t, you may be wondering when to use an STD and when to use a DT. If you’re working on a system where the order of events matter, then use an STD; Decision Tables only work if the order of events doesn’t matter.

Another benefit of STDs is that we can use them to design our test cases. To test a system completely, you’d need to cover all possible paths in the STD. This is often either impractical or impossible.

In our simple example, there are only four paths from start of the STD to the end, but in larger systems there can be too many to cover in a reasonable amount of time. For these systems, you can use multiple STDs for sub-systems rather than trying to create a single STD for the entire system. This will make the STDs easier to read, but will not lower the total number of paths. It’s also common to find loops in an STD, resulting in an infinite number of possible paths.

When covering all paths is impractical, one alternative is to ensure each state (node) is covered by at least one test. This, however, would result in weak coverage. For our hotel booking system, we could test all seven states while leaving some transitions and events completely untested.

Often, the best strategy is to create tests that cover all transitions (the arrows) at least once. This guarantees you will test every state, event, action, and transition. It gives you good coverage in a reasonable amount of tests.

If you’re interested in learning more about STDs (it’s impossible to cover them fully in a short blog article) I highly recommend reading A Practitioner’s Guide to Software Test Design. It’s where I first learned about them.

The next time you’re having trouble describing a feature or designing your tests, give a State Transition Diagram or Decision Table a try. The DTs and STDs never felt so good!

 

Decision Table Testing

English: The Black eyed peasDecision tables are an effective tool for describing complex functional requirements and designing test cases. Yet, although they’re far from a new concept, I’ve only seen a handful of functional specs and test plans that included one. The best way to illustrate their effectiveness is by regaling you with a tale of when one wasn’t used.

The year was 2009. Barack Obama was recently inaugurated as the 44th President of the United States, and the Black Eyed Peas hauntingly poetic Boom Boom Pow was topping the Billboard charts. I was the lead tester on a new security feature for our product.

Our software could be installed on six server types. Depending the server type and whether it was a domain controller, eleven variables would be set. The project PM attempted to describe these setting combinations in paragraph format. The result was a spec that was impossible to follow.

I tried designing my test cases from the document, but I wasn’t convinced all possible configurations were covered. You know how poor programming can have “code smell“? Well, this had “spec smell”. I decided to capture the logic in a decision table.

According to the always-reliable Wikipedia, decision tables have been around since ancient Babylon. However, my uncle, Johnny “Dumplings”, who is equally reliable, insists they’ve only been around since Thursday. I suspect the true answer lies somewhere in between.

If you’re not familiar with decision tables, let’s go through a simple example. Assume your local baseball squadron offers free tickets to kids and discounted tickets to senior citizens. One game a year, free hats are given to all fans.

To represent this logic in a decision table, create a spreadsheet and list all inputs and expected results down the left side. The inputs in this case are the fan’s age and sex. The expected results are ticket price and hat color.

Each row of a decision table should contain the different possible values for a single variable. Each column, then, represents a different combination of input values along with their expected results. In this example, the first column represents the expected results for boys under 5 — “Free Admission” and “Blue Hat”. The last column shows that female senior citizens get $10 tickets and a pink hat.

Rule 1 Rule 2 Rule 3 Rule 4 Rule 5 Rule 6
Inputs
Age < 5 Y Y
5 =< Age < 65 Y Y
Age >= 65 Y Y
Sex M F M F M F
Results
Free Admission Y Y
$10 Admission Y Y
$20 Admission Y Y
Blue Hat Giveaway Y Y Y
Pink Hat Giveaway Y Y Y

There are three major advantages to decision tables.

  1. Decision tables define expected results for all input combinations in an easy-to-read format. When included in your functional spec, decision tables help developers keep bugs out of the product from the beginning.
  2. Decision tables help us design our test cases. Every column in a decision table should be converted into at least one test case. The first column in this table defines a test for boys under 5. If an input can be a range of values, however, such as “5 =< Age < 65”, then we should create tests at the high and low ends of the range to validate our boundary conditions.
  3. Decision tables are effective for reporting test results. They can be used to clearly show management exactly what scenarios are working and not working so informed decisions can be made.

It’s important to note that decision tables only work if the order the conditions are evaluated in, and the order of the expected results, doesn’t matter. If order matters, use a state transition diagram instead. I’ll blabber about them in a future article.

Getting back to my story, I used the spec to fill in a decision table representing all possible server combinations and expected results.  The question marks in the table clearly showed that results weren’t defined for several installation configurations. I brought this to the Program Manager’s attention, and we will able to fill in the blanks to lock down the requirements.

Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7 Case 8 Case 9 Case 10
Inputs
Server Type Type 1 Type 1 Type 2 Type 2 Type 3 Type 3 Type 4 Type 4 Type 5 Type 6
DC Y N Y N Y N Y N * *
Results
Setting1 Y Y Y Y Y ? N N N N
Setting2 Y ? Y Y Y ? N N N N
Setting3 Y Y Y Y Y ? N N N N
Setting4 Y Y Y Y Y ? N ? N N
Setting5 Y Y Y Y Y ? N N N N
Setting6 Y Y Y Y Y ? N N N N
Setting7 ? ? ? ? ? ? ? ? ? ?
Setting8 N N N N N ? Y Y N N
Setting9 ? N N N N ? Y N N N
Setting10 N N ? Y Y ? N N ? N
Setting11 N N Y Y Y ? N N N N

This easy-to-understand table took the place of 2 full pages of spaghetti text in the functional spec. As a result, the developers had a comprehensive set of requirements to work off of, and I had designed a thorough set of test cases to validate their work. Boom Boom Pow, indeed!

Writing Good Test Plans and Writing Good Tests

This post is based on one of my test projects, “Testing SQL Express Setup integration with Visual Studio 2012”.  Unlike typical testing strategies, we decided to do manual testing instead of automated testing.  We spent a lot of effort thinking about the customer scenarios and giving them different priorities.  Depending on the scenario and priority, we used pair wise combination strategies to select a small set of tests from the huge testing matrix.  Then our vendor team executed the selected test cases manually and signed-off on them.  We found a couple of UI bugs, a couple of useability issues, and functional bugs as well. In the end, we delivered a better setup experience for Visual Studio users.  The same test approach was used for Visual Studio 2012 testing as well.

Note:

  •  SSE means SQL Server Express
  •  MSI 4.5 means Windows Installer 4.5

Look to the Data

I was recently asked to investigate my team’s automated daily test cases. It was taking more than 24 hours to execute our “daily” test run. My job was to find why the tests were taking so long to complete, and speed them up when possible. In the process, an important lesson was reinforced: we should look to real, live customer data to guide our test planning.

I had several ideas to speed up our test passes. My highest priority was to find and address the test cases that took the longest to run. I sorted the tests by run-time, and discovered the slowest test case took over an hour to run. I tackled this first, as it would give me the biggest “bang for the buck”.

The test validated a function that took two input parameters. It iterated through each possible input combination, verifying the result. The code looked like this:

for setting1 = 1 to 5
{
  for setting2 = 0 to 5
  {
    validate(setting1, setting2);
  }
}

The validate function took two minutes to execute. Since it was called thirty times, the test case took an hour to complete. I first tried to improve the performance of the validate function, but the best I could do was shave a few seconds off its run-time.

My next thought was whether we really needed to test all 30 input combinations. I requested access to the live production data for these fields. I found just two setting combinations accounted for 97% of the two million production scenarios; four combinations accounted for almost 99%. Many of the combinations we were testing never occurred at all “in the real world.”

Most Common Data Combinations

setting1 setting2 % of Production Data
5 5 73%
1 5 24%
1 0 .9%
2 1 .8%
2 5 .7%
3 5 .21%
1 1 .2%
3 0 .1%
4 0 .09%

I reasoned that whatever change I made to this test case, the two most common combinations must always be tested. Executing just these two would test almost all the production scenarios, and the run-time would be cut to only four minutes. But notice that the top two combinations both have a setting2 value of 5. Validating just these two combinations would leave a huge hole in our test coverage.

I considered testing only the nine combinations found in production. This would guarantee we tested all the two million production scenarios, and our run-time would be cut from one hour to 18 minutes. The problem with this solution is that if a new combination occurred in production in the future, we wouldn’t have a test case for it; and if there was a bug, we wouldn’t know about it until it was too late.

Another possible strategy would be to split this test into multiple test cases, and use priorities to run the more common scenarios more often. For example, the two most common scenarios could be classified as P1, the next seven P2, and the rest P3. We would then configure our test passes to execute the P1s daily, the P2s weekly, and the P3s monthly.

The solution I decided on, however, was to keep it as one test case and always validate the two most common combinations as well as three other, randomly generated, combinations. This solution guaranteed we test at least 97% of the live production scenarios each night. All thirty combinations would be tested, on average, every 10 days–more often than the “priority” strategy I had considered. The solution reduced the test’s run-time from over an hour to about 10 minutes. The entire project was a success as well; we reduced the test pass run-time from 27 hours to less than 8 hours.

You may be thinking, I could have used pairwise testing to reduce the number of combinations I executed. Unfortunately, pairwise doesn’t help when you only have two input parameters. This strategy ensures each pair of parameters are validated, so the total number of combinations tested would have remained at thirty.

In hindsight, I think I could have been smarter about the random combinations I tested. For example, I could have ensured I generated at least one test case for each possible value of setting1 and setting2.

I also could have associated a “weight” with each combination based on how often it occurred in production. Settings would still be randomly chosen, but the most common ones would have a better chance of being generated. I would just have to be careful to assign some weight to those combinations that never appear in production; this would make sure all combinations are eventually tested. I think I’ll use this strategy the next time I run into a similar situation.