If you test at Microsoft, Apple, or Amazon, and are interested in contributing to the Expert Testers blog, please email us.
Legal Mumbo Jumbo
The opinions expressed in this blog are those of the author and do not necessarily state or reflect those of Microsoft. Any reblogging without the expressed written consent of Major League Baseball is just fine with us.
When testing a UI, it’s important to not only validate each input field, but to do so using interesting data. There are plenty of techniques for doing so, such as boundary value analysis, decision tables, state-transition diagrams, and combinatorial testing. Since you’re reading a testing blog, you’re probably already familiar with these. Still, it’s still nice to have a short, bulleted checklist of all the tools at your disposal. When I recently tested a new web-based UI, I took the opportunity to create one.
One of the more interesting tests I ran was a successful HTML injection attack. In an input field that accepted a string, I entered: <input type=”button” onclick=”alert(‘hi’)” value=”click me”>. When I navigated to the web page that should have displayed this string, I instead saw a button labeled “click me”. Clicking on it produced a pop-up with the message “hi”. The web page was rendering all HTML and JavaScript I entered. Although my popup was fairly harmless, a malicious user could have used this same technique to be, well, malicious.
Another interesting test was running the UI in non-English languages. Individually, each screen looked fine. But when I compared similar functionality on different screens, I noticed some dates were formatted mm/dd/yyyy and others dd/mm/yyyy. In fact, the most common bug type I found was inconsistencies between screens. The heading on some pages were name-cased, while others were lower-cased. Some headings were butted against the left size of the screen, and others had a small margin. Different fonts were used for similar purposes.
Let’s get back to boundary value analysis for a minute. Assume you’re testing an input field that accepts a value from 1 to 100. The obvious boundary tests are 0, 1, 100, and 101. However, there’s another, less obvious, boundary test. Since this value may be stored internally as an integer, a good boundary test is a number too large to be stored as an int.
My UI Testing Checklist has all these ideas, plus plenty more: accented characters, GB18030 characters, different date formats, leap days, etc.. It’s by no means complete, so please leave a comment with anything you would like added. I can (almost) guarantee it’ll lead you to uncover at least one new bug in your project.
Click this image to download the UI Testing Checklist
When testing even a simple application, it’s often impossible, or at least impractical, to test with all possible inputs. This is true whether you’re testing manually or using automation; throwing computing power at the problem doesn’t usually solve it. The question is, how do you find those inputs that give the most coverage in the fewest tests.
Assume we have a UI with just two fields for creating a user account. User Name can be between 1 and 25 characters, but can’t contain a space. Sex can be either Male or Female.
To test all possible inputs, we would first test single-character names using all possible characters. Then test two-character names using all possible characters in the first position, along with all possible characters in the second position. This process would be repeated for all lengths between 1 and 25. We then need to repeat all these names for Males and Females. I’m no mathematician, but this works out to be close to a kagillion possible inputs.
If this simple UI can have so many inputs, then it’s clear we’ll usually have to test with a very small subset of all possible data. Equivalence classes (ECs) give us the guidance we need to select these subsets.
ECs are subsets of input data, where testing any one value in the set should be the same as testing the others. Here we have two equivalence classes of valid input data.
Field
Description
1. User Name
Length >=1 and <=25 not containing a space
2. Sex
Male or Female
Since we have more than one EC of valid data, we can create a single test that randomly picks one value from each. After all, testing with A/Male should give us the same result as testing with AB/Female or JKHFGKJkjgjkhg/Male. In all three cases, a new user account should be created.
It’s also important to create classes of invalid input. If you have an EC of valid data in the range of 1-10, you should also an EC of invalid data less than 1 and an EC of values greater than 10. We can create five equivalence classes of invalid input for our sample app.
Field
Description
3. User Name
Empty
4. User Name
Length > 25 (no spaces)
5. User Name
Length > 25 (containing a space)
6. User Name
Length >1 and <=25 containing a space
7. Sex
Neither Male nor Female
When designing your tests, it’s important to only pick one invalid value per test case; using more than one can lead to missed product bugs. Suppose you choose a test with both an invalid User Name and an invalid Sex. The product may throw an error when it detects the invalid User Name, and never even get to the logic that validates the Sex. If there are bugs in the Sex-validation logic, your test won’t catch it.
Using our ECs, we’d create the following tests.
Valid name from #1 and valid Sex from #2
Invalid name from #3 and valid Sex from #2
Invalid name from #4 and valid Sex from #2
Invalid name from #5 and valid Sex from #2
Invalid name from #6 and valid Sex from #2
Valid name from #1 and invalid Sex from #7
Equivalence classes have reduced our number of tests from a kagillion to six. As Adam Sandler might say, “Not too shabby.”
Not only does EC testing make sure you don’t have too many tests, it also ensures you don’t have too few. Testers not familiar with equivalence classes might test this UI with just four pairs of hardcoded values:
John/Male
Jane/Female
John Smith/Male
Joan/Blah
When these tests are automated and running in your test pass, they validate the same values day after day. Using equivalence class testing with random values, we validate different input every day, increasing our odds of finding bugs.
The day I wrote this article I found a UI bug that had been in the product for several months. The UI didn’t correctly handle Display Names of more than 64 characters. Sure enough, the UI tests didn’t use ECs, and tested with the same hardcoded values every day. If they had chosen a random value from an EC, the test would have eventually found this bug.
Another benefit of equivalence class testing is that it leads us Boundary value testing. When an equivalence class is a range, we should create tests at the high and low edge of the range, and just outside the range. This helps us find bugs in cases when the developer may have used a > instead of >=.
Boundary tests, however, should not take the place of your EC tests. In our example, don’t hardcode your EC test to have a User Name length of 25. Your EC tests should choose a random number between 1 and 25. You should have a separate Boundary tests that validate length 25.
And don’t forget to create tests for values far outside the range. The size of your variable is a type of EC class as well! This could lead to other bugs such as exceeding the max length of a string, or trying to shove a long number into an int.
When defining your ECs, the first place to look is the Feature Specification Document (or whatever your Agile equivalent may be.) The other place is in the product code. This, however, can be risky. If the source code has a statement like if x> 1 && x<5, you would create your EC as 2,3,4, and your tests will pass. But how do you know the source code was correct? Maybe it was intended to be x>=1 && x<=5. That’s why you should always push for valid input rules to be defined in a spec.
Another technique for creating ECs is to is to break up the output into equivalence classes. Then choose inputs that give you an output from each class. In our example, assume the user account can be stored in one of two databases depending on whether the User Name contains Chinese characters. In this case, we would need one EC of valid Chinese User Names and one of valid non-Chinese User Names.
PS – Not only is this article a (hopefully) interesting lesson on equivalence classes, it’s also an interesting experiment for the Expert Testers blog. In this article, I used the word Sex 15 times. I’m curious if it gets more traffic than usual because of it. If it does, you can look forward to my next article, “The Naked Truth – Pairwise Testing Exposed!”
Testers are often encouraged to automate more and more test cases. At first glance, the case for more test cases makes sense—the more tests you have, the better your product is tested. Who can argue with that? I can.
Creating too many test cases leads to the condition known as “test case bloat”. This occurs when you have so many test cases that you spend a disproportionate amount of time executing, investigating, and maintaining these tests. This leaves little time for more important tasks, such as actually finding and resolving product issues. Test case bloat causes the following four problems:
1. Test passes take a long time to complete.
The longer it takes for your test pass to complete, the longer you have to wait before you can begin investigating the failures. I worked on one project where there were so many test cases, the daily test pass took 27 hours to finish. It’s hard to run a test pass every day when it takes more than 24 hours to complete.
2. Failure investigations take a long time to complete.
The more tests you have, the more failures you have to investigate. If your test pass takes a day to complete, and you have a mountain of failures to investigate, it could be two days or longer before a build is validated. This turn-around time may be tolerable if you’re shipping your product on a DVD. But when your software is a service, you may need to validate product changes a lot faster.
For example, the product I’m working on is an email service. If a customer is without email, it’s unacceptable for my team to take this long to validate a bug fix. Executing just the highest-priority tests to validate a hot-fix may be a valid compromise. If you have a lot of test cases, however, even this can take too long.
3. Tests take too much effort to maintain.
When your automation suffers from test case bloat, even subtle changes in product functionality can cause massive ripples in your existing test cases, drastically increasing the amount of time you spend maintaining them. This leaves little time for other, more valuable tasks, such as testing new features. It’s also a morale killer. Most testers I know— the really good ones, at least— don’t want to continually maintain the same test cases. They want to test new features and write new code.
4. After a certain threshold, more test cases no longer uncover product bugs—they mask them.
Most test cases only provide new information the first time they’re run. If the test passes, we can assume the feature works. If the test fails, we file a bug, which is eventually fixed by development, and the test case begins to pass. If it’s written well, the test will continue to pass unless a regression occurs.
Let’s assume we have 25 test cases that happily pass every time they’re run. At 3:00 a.m. an overtired developer then checks in a bug causing three tests to fail. Our pass rate would drop from 100% to an alarming 88%. The failures would be quickly investigated, and the perpetrator would be caught. Perhaps we would playfully mock him and make him wear a silly hat.
But what if we had 50 test cases? Three failures out of 50 test cases is a respectable 94% pass rate. What about a hundred or two hundred tests? With this many tests, it’s now very possible that there are some amount of failures in every pass simply due to test code problems; timing issues are a common culprit. The same three failures in two hundred tests is a 99% pass rate. But were these failures caused by expected timing issues, or a real product bug? If your team was pressed to get a hot-fix out the door to fix a live production issue, it may not investigate a 99% pass rate with as much vigor as an 88% pass rate.
Bloat Relief
If your automation suffers from test case bloat, you may be able to refactor your tests. But you can’t simply mash four or five tests with different validation points into a single test case. The more complicated a test, the more difficult it becomes to determine the cause and severity of failure.
You can, however, combine test cases when your validation points are similar, and the severity of a failure at each validation point is the same. For example, if you’re testing a UI dialog, you don’t need 50 different test cases to validate that 50 objects on the screen are all at their expected location. This can be done in one test.
You can also combine tests when you’re checking a single validation point, such as a database field, with different input combinations. Don’t create 50 different test cases that check the same field for 50 different data combinations. Create a single test case that loops through all combinations, validating the results.
When my test pass was taking 27 hours to complete, one solution we discussed was splitting the pass based on priority, feature, or some other criteria. If we had split it into three separate test passes, each would have taken only nine hours to finish. But this would have required three times as many servers. That may not be an issue if your test pass runs on a single server or virtual machines, however I’ve worked on automation that required more than twenty physical servers–tripling your server count is not always an option.
In addition to the techniques discussed above, pair-wise testing and equivalence class partitioning are tools that all testers should have in their arsenal. The ideal solution, however, is to prevent bloating before it even starts. When designing your test cases, it’s important to be aware of the number of tests you’re writing. If all else fails, I hear you can gain time by investigating your test failures while travelling at the speed of light.