Equivalence Class Testing

When testing even a simple application, it’s often impossible, or at least impractical, to test with all possible inputs. This is true whether you’re testing manually or using automation; throwing computing power at the problem doesn’t usually solve it. The question is, how do you find those inputs that give the most coverage in the fewest tests.

Assume we have a UI with just two fields for creating a user account. User Name can be between 1 and 25 characters, but can’t contain a space. Sex can be either Male or Female.

To test all possible inputs, we would first test single-character names using all possible characters. Then test two-character names using all possible characters in the first position, along with all possible characters in the second position. This process would be repeated for all lengths between 1 and 25. We then need to repeat all these names for Males and Females. I’m no mathematician, but this works out to be close to a kagillion possible inputs.

If this simple UI can have so many inputs, then it’s clear we’ll usually have to test with a very small subset of all possible data. Equivalence classes (ECs) give us the guidance we need to select these subsets.

ECs are subsets of input data, where testing any one value in the set should be the same as testing the others. Here we have two equivalence classes of valid input data.

Field Description
1. User Name Length >=1 and <=25 not containing a space
2. Sex Male or Female

Since we have more than one EC of valid data, we can create a single test that randomly picks one value from each. After all, testing with A/Male should give us the same result as testing with AB/Female or JKHFGKJkjgjkhg/Male. In all three cases, a new user account should be created.

It’s also important to create classes of invalid input. If you have an EC of valid data in the range of 1-10, you should also an EC of invalid data less than 1 and an EC of values greater than 10. We can create five equivalence classes of invalid input for our sample app.

Field Description
3. User Name Empty
4. User Name Length > 25 (no spaces)
5. User Name Length > 25 (containing a space)
6. User Name Length >1 and <=25 containing a space
7. Sex Neither Male nor Female

When designing your tests, it’s important to only pick one invalid value per test case; using more than one can lead to missed product bugs. Suppose you choose a test with both an invalid User Name and an invalid Sex. The product may throw an error when it detects the invalid User Name, and never even get to the logic that validates the Sex. If there are bugs in the Sex-validation logic, your test won’t catch it.

Using our ECs, we’d create the following tests.

  1. Valid name from #1 and valid Sex from #2
  2. Invalid name from #3 and valid Sex from #2
  3. Invalid name from #4 and valid Sex  from #2
  4. Invalid name from #5 and valid Sex from #2
  5. Invalid name from #6 and valid Sex from #2
  6. Valid name from #1 and invalid Sex from #7

Equivalence classes have reduced our number of tests from a kagillion to six. As Adam Sandler might say, “Not too shabby.”

Not only does EC testing make sure you don’t have too many tests, it also ensures you don’t have too few. Testers not familiar with equivalence classes might test this UI with just four pairs of hardcoded values:

  1. John/Male
  2. Jane/Female
  3. John Smith/Male
  4. Joan/Blah

When these tests are automated and running in your test pass, they validate the same values day after day. Using equivalence class testing with random values, we validate different input every day, increasing our odds of finding bugs.

The day I wrote this article I found a UI bug that had been in the product for several months. The UI didn’t correctly handle Display Names of more than 64 characters. Sure enough, the UI tests didn’t use ECs, and tested with the same hardcoded values every day. If they had chosen a random value from an EC, the test would have eventually found this bug.

Another benefit of equivalence class testing is that it leads us Boundary value testing. When an equivalence class is a range, we should create tests at the high and low edge of the range, and just outside the range. This helps us find bugs in cases when the developer may have used a > instead of >=.

Boundary tests, however, should not take the place of your EC tests. In our example, don’t hardcode your EC test to have a User Name length of 25. Your EC tests should choose a random number between 1 and 25. You should have a separate Boundary tests that validate length 25.

And don’t forget to create tests for values far outside the range. The size of your variable is a type of EC class as well! This could lead to other bugs such as exceeding the max length of a string, or trying to shove a long number into an int.

When defining your ECs, the first place to look is the Feature Specification Document (or whatever your Agile equivalent may be.)  The other place is in the product code. This, however, can be risky. If the source code has a statement like if x> 1 && x<5, you would create your EC as 2,3,4, and your tests will pass. But how do you know the source code was correct? Maybe it was intended to be x>=1 && x<=5. That’s why you should always push for valid input rules to be defined in a spec.

Another technique for creating ECs is to is to break up the output into equivalence classes. Then choose inputs that give you an output from each class. In our example, assume the user account can be stored in one of two databases depending on whether the User Name contains Chinese characters. In this case, we would need one EC of valid Chinese User Names and one of valid non-Chinese User Names.

PS – Not only is this article a (hopefully) interesting lesson on equivalence classes, it’s also an interesting experiment for the Expert Testers blog. In this article, I used the word Sex 15 times. I’m curious if it gets more traffic than usual because of it. If it does, you can look forward to my next article, “The Naked Truth – Pairwise Testing Exposed!”

There’s a Smart Monkey in my toolbelt

This is a follow on to Andrew’s article “State-Transition Testing.”  If your software can be described using states, you can use monkey automation to test your product.  While smart monkeys can take a little time to implement, even using third-party automation software, their payback can be enhanced if you release multiple versions, follow a rapid release cadence, or are short on testing resources.

Let me start by noting that Smart Monkey testing differ from Netflix’s Simian Army.  Similar name, both are code testing code.  Other than that, they are different.

In general, monkey testing is automated testing with no specific purpose in mind other than to test the product or service.  It’s also known as “stochastic testing” and is a tool in your black box tester tool belt.  Monkey testing comes in two flavors:

Dumb Monkey testing is when the automation randomly sends keystrokes or commands to the System Under Test (SUT).  The automation system or tester monitors the SUT for crashes or other incorrect reactions.  One example would be feeding random numbers into a dialog that accepts a dollar value.  Another example comes from when I was a tester in Office.  One feature I was testing is the ability to open web pages in Microsoft Word.  I wrote a simple macro in Visual Basic for Applications (VBA) to grab a random web page from the web browser, log the web address to a log file, and try to open the web page in Word.  If Word crashed, my macro would stop, and I would know that I hit a severity one (i.e. crashing) bug.  I can watch my computer test for me all day…or just run it at night.  Free bugs!

Funny Aside: I had a little code in the macro to filter out, um, inappropriate web sites.  Occasionally one got through the code, an alarm would go off in our IT department, and my manager would receive an email that I was viewing sites against HR’s policy.  It happened every few months.  I would show that it was the automation, not me, doing the offensive act.  We would laugh about it, put a letter of reprimand in the monkey’s employee file, and move on.

Smart Monkey testing is, well, smarter.  Instead of doing random things, the automation system knows something about your program:

  • Where it is
  • What it can do there
  • Where it can go and how to get there
  • Where it’s been
  • If what it’s seeing is correct

This is where State Transition comes into play.

Let’s look at a State Transition diagram for a Learning Management System (LMS) and how the enrollment status of a student could change.

LMS State Diagram

You would define for the smart monkey automation the different states, which states changes are possible, and how to get to those.  For the diagram above, it might look something like this:

Where it is: Registered

What it can do there #1: Learner can Cancels their registration

What it can do there #2: Learner can Attend the class

Where it is: Canceled

What it can do there #1: Learner can Register for the class

What it can do there #2: Learner can be added to the Waitlist

Where it is: Waitlisted

What it can do there #1: Learner can Cancel their registration

What it can do there #2: The learner will be automatically registered by the system if a spot opens for them

You get the general idea.  This still looks like Andrew’s State-Transition Testing.  What is different is the automation knows this information.  When you start the automation, it will start randomly traversing the states.  Sometimes it will follow an expected path:

Register | Attend

The next time, it might try a path you didn’t expect (the learner ignores their manager and attends anyways):

Request | Request Declined | Walk-in

It might also do things like request and cancel the same class fifty times.

Don’t have time to define all of the states?  Some third party software will randomly explore your software and create the state model and how to traverse it for you. You can then double-check the model it create, make any corrections or additions, and you are on your way!

How can you improve this system?

  1. Have the monkey access an XML or other data file with specific cases you would like hit more than the random ones.  For example, you could use the PICT tool to create a list of the most interesting combinations of inputs to hit.
  2. You can also make this system smarter by assigning probabilities to each state change.  Find out how often the user actually cancels a registration.  Now feed that data into the smart monkey.  Now your monkey will test the state changes at the same frequency as the real user.
  3. The next step?  Tie your monkey into live customer data: Customer data-driven quality (CDDQ).  For example, let’s say all of a sudden your customers start canceling a lot of class registrations due to an upcoming holiday.  Your smart monkey will automatically start testing the cancel registration state change more often.

The whole idea of smart monkey testing is it will follow expected and unexpected paths.  You can run the monkey on spare machines, on your test machine overnight, pretty much anytime; and give you free testing.  If your logging is good enough and tracks the state, and which transition path it followed, you will be able to reproduce any bugs it finds.  Watch your code coverage numbers go up…maybe.  But that’s fodder for another posting.

Long live the smart monkey!

State-Transition Testing

One of our goals at Expert Testers is to discuss practical topics that can help every tester do their job better. To this end, my last two articles have been about Decision Table Testing and Being an Effective Spec Reviewer. Admittedly, neither of these topics break new ground. That doesn’t mean, however, most testers have mastered these techniques. In fact, almost 50% of the respondents to our Decision Table poll said they’ve never used one.

Continuing the theme of discussing practical topics, let’s talk about State Transition Diagrams. State Transition Diagrams, or STDs as they’re affectionately called, are effective for documenting functionality and designing test cases. They should be in every testers bag of tricks, along with Decision Tables, Pair-Wise analysis, and acting annoyed at work to appear busy.

STDs show the state a system will move to, based on its current state and other inputs. These words, I understand, mean little until you’ve seen one in action, so let’s get to an example. Since I’m particularly busy (i.e., lazy) today, I’ll use a simple example I found on the web.

Below is a Hotel Reservation STD. Each rectangle, or node, represents the state of the reservation. Each arrow is a transition from one state to the next. The text above the line is the input–the event that caused the state to change. The text below the line is the output–the action the system performs in response to the event.

clip_image001

Pasted from <http://users.csc.calpoly.edu/~jdalbey/SWE/Design/STDexamples.html>

One benefit of State Transition Diagrams is that they describe the behavior of the system in a complete, yet easy-to-read and compact, way. Imagine describing this functionality in sentence format; it would take pages of text to describe it fully. STDs are much simpler to read and understand. For this reason, they can show paths that were missed by PM or Developer, or paths the Tester forgot to test.

I learned this when I was testing Microsoft Forefront Protection for Exchange Server, a product that protects email customers from malware and spam. The product logic for determining when a message would be scanned was complicated; it depended on the server role, several Forefront settings, and whether the message was previously scanned.

The feature spec described this logic in sentence format, and was nearly impossible to follow. I took it upon myself to create a State Transition Diagram to model the logic. I printed it out and stuck it on my office (i.e., cubicle) wall. Not a week went by without a Dev, Tester, or PM stopping by to figure out why their mail wasn’t being scanned as they expected.

If you read my article on Decision Tables (DTs), and I’m sure you didn’t, you may be wondering when to use an STD and when to use a DT. If you’re working on a system where the order of events matter, then use an STD; Decision Tables only work if the order of events doesn’t matter.

Another benefit of STDs is that we can use them to design our test cases. To test a system completely, you’d need to cover all possible paths in the STD. This is often either impractical or impossible.

In our simple example, there are only four paths from start of the STD to the end, but in larger systems there can be too many to cover in a reasonable amount of time. For these systems, you can use multiple STDs for sub-systems rather than trying to create a single STD for the entire system. This will make the STDs easier to read, but will not lower the total number of paths. It’s also common to find loops in an STD, resulting in an infinite number of possible paths.

When covering all paths is impractical, one alternative is to ensure each state (node) is covered by at least one test. This, however, would result in weak coverage. For our hotel booking system, we could test all seven states while leaving some transitions and events completely untested.

Often, the best strategy is to create tests that cover all transitions (the arrows) at least once. This guarantees you will test every state, event, action, and transition. It gives you good coverage in a reasonable amount of tests.

If you’re interested in learning more about STDs (it’s impossible to cover them fully in a short blog article) I highly recommend reading A Practitioner’s Guide to Software Test Design. It’s where I first learned about them.

The next time you’re having trouble describing a feature or designing your tests, give a State Transition Diagram or Decision Table a try. The DTs and STDs never felt so good!

 

Test In Production – What is it all about?

I would like to share my thoughts about test-in-production (a.k.a TiP.) This term has become a buzz word in the testers wonderland as the industry is moving more towards providing solutions in the cloud. Here are 4 easy questions thru which I plan to address this.

I like to explain this with an analogy of a box product in the olden days vs cloud services today. In earlier days when software was shipped as a box product or a downloadable executable, testing was much simpler, in a way. Those box products have well-defined system requirements like Operating system (type, version), supported locale, disk space, RAM, yada yada yada. So when testers define the test plan its self-contained within those boundaries defined by the product. When the end-user buys the box product it is at his own decision on which hardware he can install the software. It is a balanced equation I guess, i.e, what’s tested to what’s installed, and works as expected = success if end-user chooses the hardware meeting the system requirements.

With the evolution of today’s cloud oriented solutions, customers want solutions that optimize cost (which is one of the reason cloud is evolving, in my opinion). The companies providing the software service decides on the hardware to suit the scale and performance need. In reality, not all software is custom-made to a h/w. So there are many variables that are associated to the h/w when it comes to testing software services in the cloud. For example, when you host your solution that is used by 100’s of 1000s’ of users you can think of 10’s of 100’s of servers in the data center.

The small software once tested in 1 machine or multiple machines (depending on what software architecture you are testing) now becomes a huge network tied up to various levels of Service Level Agreement (SLA) like performance, latency, scale, fault tolerance, security, blah blah blah.  Although it is very much possible to simulate the data center kind of setup within your corporate environment  there may/will be lot of difference when it comes to the actual setup in the data center. Some of these may include, but are not limited to, load balancers, active directory credentials, different security policy applied on the hosts, domain controller configurations specific to your hosting setup, storage access credentials; and these are just the tip of the iceberg.

What

So what is TiP? My definition for TiP is the end-to-end customer test scenario you can author with the required input parameters and target to run constantly in a predefined interval against the software end points of the hosted service. This validates the functionality and component integration, and provides a binary result: Pass or Fail. There are at least 2 different types of TiP tests you can author: Outside-In(OI) and Inside-Out(IO).

Outside-In(OI): These tests run outside your production environment targeting your software end point.

Inside-Out(IO): These tests run from within your data center targeting different roles you may have to ensure they are all functioning properly.

Why

TiP enables you to proactively find any issues before you could hear from a customer. Since the tests are running against your live site, it is expected to have appropriate monitoring built into the architecture so that the failures from these critical tests are escalated accordingly and appropriate action is taken. TiP is a valuable asset to validate your deployment and any plumbing between different software role* you may have in your architecture. TiP plays a critical role during service deployment or upgrade as it runs end-to-end tests on the production systems before it can go live to take the real-world traffic. Automated TiP scenario tests may save a lot of the testers from manually validating the functionality in production system.

When

TiP is recommended to be running all the time, for as long as you keep your s/w service alive.

How

I’m not going to go into any design in how. Rather it’s a high level thought. Identify from your test plan a few critical test paths that cover both happy path and negative test cases. Give priority to the test case that cover maximum code path and components. For example, if your service has replication, SQL transaction, flush policy, etc., encapsulate all of this into a single test case and try to automate the complex path. This will help ensure that the whole pipeline in your architecture is servicing as expected. There is no right or wrong tools for this. From batch files and shell scripts, to C# and Ruby on Rails, it’s up to you to find the right tool set and language appropriate for the task.

*role – An installation or instance of the operating system serving a specific capability. For example, an authentication system could be one instance of the OS in your deployment whose functionality is just to authenticate all the traffic to access your service.

Being an Effective Spec Reviewer

The first time testers can impact a feature is often during functional and design reviews. This is also when we make our first impression on our co-workers. If you want to make a great initial impact on both your product and peers, you have to be an effective reviewer.

In my seven years in testing, I’ve noticed that many testers don’t take advantage of this opportunity. Most of us fall into one of four categories:

  1. Testers who pay attention during reviews without proving feedback. This used to be me. I learned the feature, which is one of the goals of a review meeting. A more important goal, however, is to give feedback that exposes defects as early as possible.
  2. Testers who push-back (argue) at seemingly every minor point. Their goal is to increase their visibility and prove their worth as much as it is to improve the product. They learn the feature and can give valuable feedback. However, while they think they’re impressing their teammates, they’re actually frustrating them.
  3. Testers who attend reviews with their laptops open, not paying attention. If this is you, please stop; no one’s impressed with how busy you’re pretending to be.
  4. Testers who pay attention and learn the feature, while also providing constructive feedback. Not only do they understand and improve the feature, but they look good doing it. This can be you!

How do you do this, you ask? With this simple recipe that only took me four years to learn, I answer.

1. Read the Spec

Before attending any functional or design review, make sure you read the documentation. This is common sense, but most of us are so busy we don’t have time to read the specs. Instead, we use the review itself to learn the feature.

This was always a problem for me because although I learned the feature during the review, I didn’t have enough time to absorb the details and give valuable feedback during the meeting. It was only afterwards when I understood the changes we needed to make. By then it was too late–decisions had already been made and it was hard to change people’s minds. Or coding had begun, and accepting changes meant wasted development time.

A great idea Bruce Cronquist suggested is to block out the half hour before the review meeting to read the spec. Put this time on your calendar to make sure you don’t get interrupted.

2. Commit to Contributing

Come to every review with the goal of contributing at least one idea. Once I committed to this, I immediately made a bigger impact on both my product and peers. This strategy works for two reasons.

First, it forces you to pay closer attention than you normally might have. If you know you’ll be speaking during the meeting, you will pay closer attention.

Second, it forces you to speak up about ideas you might otherwise have kept to yourself. I used to keep quiet in reviews if I wasn’t 100% sure I was right. Even if I was almost positive, I would still investigate further after the meeting. The result was that someone else would often mention the idea first.

It  took four years for me to realize this is an effective tool.

3. Have an Agenda

It’s easy to say you’ll give a good idea during every review, but how can you make sure you’ll always have a good idea to give? For me, the answer was a simple checklist.

The first review checklist I made was to make sure features are testable. Not only are testers uniquely qualified to enforce testability, but if we don’t do it no one will. Bringing up testability concerns as early as possible will also make your job of testing the feature later-on much easier. My worksheet listed the key tenets of testability, had a checklist of items for each tenant, and room for notes.

At the time, I thought the concept of a review checklist was revolutionary. So much so, in fact, that I emailed Alan Page about it no less than five times. I’m now sure Alan must have thought I was some kind of stalker or mental patient. However, he was very encouraging and was kind enough to give the checklist a nice review on Toolbox–a Microsoft internal engineering website. If you work at Microsoft, you can download my testability checklist here.

I now know that not only are checklists the exact opposite of revolutionary, but there are plenty of other qualities to look for than just testability.

Test is the one discipline that knows about most (or all) of the product features.  It’s easy for us to find and identify inconsistencies between specs, such as when one PM says the product should do X, while another PM says it should do Y. It’s also our job to be a customer advocate. And we need to enforce software qualities such as performance, security, and usability. So I decided to expand my checklist.

My new checklist includes 50 attributes to look for in functional and design reviews. It’s in Excel format, so you can easily filter the items based on Review Type (Feature, Design, or Test) and Subtype (Testability, Usability, Performance, Security, etc.)

Review Checklist

Click this image to download the Review Checklist.

If there are any other items you would like added to the checklist, please list them in the comments section below. Enjoy!

%d bloggers like this: