UI Testing Checklist

When testing a UI, it’s important to not only validate each input field, but to do so using interesting data. There are plenty of techniques for doing so, such as boundary value analysis, decision tables, state-transition diagrams, and combinatorial testing. Since you’re reading a testing blog, you’re probably already familiar with these. Still, it’s still nice to have a short, bulleted checklist of all the tools at your disposal. When I recently tested a new web-based UI, I took the opportunity to create one.

One of the more interesting tests I ran was a successful HTML injection attack. In an input field that accepted a string, I entered: <input type=”button” onclick=”alert(‘hi’)” value=”click me”>. When I navigated to the web page that should have displayed this string, I instead saw a button labeled “click me”. Clicking on it produced a pop-up with the message “hi”.  The web page was rendering all HTML and JavaScript I entered. Although my popup was fairly harmless, a malicious user could have used this same technique to be, well, malicious.

inject

 

 

 

 

 

 

Another interesting test was running the UI in non-English languages. Individually, each screen looked fine. But when I compared similar functionality on different screens, I noticed some dates were formatted mm/dd/yyyy and others dd/mm/yyyy. In fact, the most common bug type I found was inconsistencies between screens. The heading on some pages were name-cased, while others were lower-cased. Some headings were butted against the left size of the screen, and others had a small margin. Different fonts were used for similar purposes.

Let’s get back to boundary value analysis for a minute. Assume you’re testing an input field that accepts a value from 1 to 100. The obvious boundary tests are 0, 1, 100, and 101. However, there’s another, less obvious, boundary test. Since this value may be stored internally as an integer, a good boundary test is a number too large to be stored as an int.

My UI Testing Checklist has all these ideas, plus plenty more: accented characters, GB18030 characters, different date formats, leap days, etc.. It’s by no means complete, so please leave a comment with anything you would like added. I can (almost) guarantee it’ll lead you to uncover at least one new bug in your project.

 

Click this images to download the UI Testing Checklist

Click this image to download the UI Testing Checklist

Continue reading

Writing Good Test Plans and Writing Good Tests

This post is based on one of my test projects, “Testing SQL Express Setup integration with Visual Studio 2012”.  Unlike typical testing strategies, we decided to do manual testing instead of automated testing.  We spent a lot of effort thinking about the customer scenarios and giving them different priorities.  Depending on the scenario and priority, we used pair wise combination strategies to select a small set of tests from the huge testing matrix.  Then our vendor team executed the selected test cases manually and signed-off on them.  We found a couple of UI bugs, a couple of useability issues, and functional bugs as well. In the end, we delivered a better setup experience for Visual Studio users.  The same test approach was used for Visual Studio 2012 testing as well.

Note:

  •  SSE means SQL Server Express
  •  MSI 4.5 means Windows Installer 4.5

TestApi… A Forgotten Soldier in the Fight Against Bugs.

Bug Fighters ala Starship Troopers

I was talking to my team today about doing globalization testing as a part of normal tests in our UI. In this conversation we were discussing ways of generating random strings using different unicode chars. I mentioned that there was a Microsoft created library for use in many different areas of testing, one of which is for string generation. However not many testers, even in Microsoft, know about or use this handy library. I’d previously used the string generation portion to do some fuzz testing in a different project, because it was much easier to use than most fuzz tools normally suggested for this purpose.

I decided it might be a good idea to disseminate this information so that it gets more use amongst our test warriors. The following is the basic information to get you started.

Here is the info on the TestApi library that Microsoft made and has many different uses:

§ Overview of TestApi

§ Part 1: Input Injection APIs

§ Part 2: Command-Line Parsing APIs

§ Part 3: Visual Verification APIs

§ Part 4: Combinatorial Variation Generation APIs

§ Part 5: Managed Code Fault Injection APIs

§ Part 6: Text String Generation APIs

§ Part 7: Memory Leak Detection APIs

§ Part 8: Object Comparison APIs

Here is an example from the String generation section that allows you to generate random strings for testing.

  //
  // Generate a Cyrillic string with a length between 10 and 30 characters.
  //

  StringProperties properties = new StringProperties();
  properties.MinNumberOfCodePoints = 10;
  properties.MaxNumberOfCodePoints = 30;
  properties.UnicodeRanges.Add(new UnicodeRange(UnicodeChart.Cyrillic));

  string s = StringFactory.GenerateRandomString(properties, 1234);

  The generated string may look as follows:
  s: Ӥёӱіӱӎ҄ҤяѪӝӱѶҾүҕГ

Enjoy!

Look to the Data

I was recently asked to investigate my team’s automated daily test cases. It was taking more than 24 hours to execute our “daily” test run. My job was to find why the tests were taking so long to complete, and speed them up when possible. In the process, an important lesson was reinforced: we should look to real, live customer data to guide our test planning.

I had several ideas to speed up our test passes. My highest priority was to find and address the test cases that took the longest to run. I sorted the tests by run-time, and discovered the slowest test case took over an hour to run. I tackled this first, as it would give me the biggest “bang for the buck”.

The test validated a function that took two input parameters. It iterated through each possible input combination, verifying the result. The code looked like this:

for setting1 = 1 to 5
{
  for setting2 = 0 to 5
  {
    validate(setting1, setting2);
  }
}

The validate function took two minutes to execute. Since it was called thirty times, the test case took an hour to complete. I first tried to improve the performance of the validate function, but the best I could do was shave a few seconds off its run-time.

My next thought was whether we really needed to test all 30 input combinations. I requested access to the live production data for these fields. I found just two setting combinations accounted for 97% of the two million production scenarios; four combinations accounted for almost 99%. Many of the combinations we were testing never occurred at all “in the real world.”

Most Common Data Combinations

setting1 setting2 % of Production Data
5 5 73%
1 5 24%
1 0 .9%
2 1 .8%
2 5 .7%
3 5 .21%
1 1 .2%
3 0 .1%
4 0 .09%

I reasoned that whatever change I made to this test case, the two most common combinations must always be tested. Executing just these two would test almost all the production scenarios, and the run-time would be cut to only four minutes. But notice that the top two combinations both have a setting2 value of 5. Validating just these two combinations would leave a huge hole in our test coverage.

I considered testing only the nine combinations found in production. This would guarantee we tested all the two million production scenarios, and our run-time would be cut from one hour to 18 minutes. The problem with this solution is that if a new combination occurred in production in the future, we wouldn’t have a test case for it; and if there was a bug, we wouldn’t know about it until it was too late.

Another possible strategy would be to split this test into multiple test cases, and use priorities to run the more common scenarios more often. For example, the two most common scenarios could be classified as P1, the next seven P2, and the rest P3. We would then configure our test passes to execute the P1s daily, the P2s weekly, and the P3s monthly.

The solution I decided on, however, was to keep it as one test case and always validate the two most common combinations as well as three other, randomly generated, combinations. This solution guaranteed we test at least 97% of the live production scenarios each night. All thirty combinations would be tested, on average, every 10 days–more often than the “priority” strategy I had considered. The solution reduced the test’s run-time from over an hour to about 10 minutes. The entire project was a success as well; we reduced the test pass run-time from 27 hours to less than 8 hours.

You may be thinking, I could have used pairwise testing to reduce the number of combinations I executed. Unfortunately, pairwise doesn’t help when you only have two input parameters. This strategy ensures each pair of parameters are validated, so the total number of combinations tested would have remained at thirty.

In hindsight, I think I could have been smarter about the random combinations I tested. For example, I could have ensured I generated at least one test case for each possible value of setting1 and setting2.

I also could have associated a “weight” with each combination based on how often it occurred in production. Settings would still be randomly chosen, but the most common ones would have a better chance of being generated. I would just have to be careful to assign some weight to those combinations that never appear in production; this would make sure all combinations are eventually tested. I think I’ll use this strategy the next time I run into a similar situation.

The Case for Fewer Test Cases

Robotium Remote Testing

Testers are often encouraged to automate more and more test cases. At first glance, the case for more test cases makes sense—the more tests you have, the better your product is tested. Who can argue with that? I can.

Creating too many test cases leads to the condition known as “test case bloat”. This occurs when you have so many test cases that you spend a disproportionate amount of time executing, investigating, and maintaining these tests. This leaves little time for more important tasks, such as actually finding and resolving product issues. Test case bloat causes the following four problems:

1. Test passes take a long time to complete.

The longer it takes for your test pass to complete, the longer you have to wait before you can begin investigating the failures. I worked on one project where there were so many test cases, the daily test pass took 27 hours to finish. It’s hard to run a test pass every day when it takes more than 24 hours to complete.

2. Failure investigations take a long time to complete.

The more tests you have, the more failures you have to investigate. If your test pass takes a day to complete, and you have a mountain of failures to investigate, it could be two days or longer before a build is validated. This turn-around time may be tolerable if you’re shipping your product on a DVD. But when your software is a service, you may need to validate product changes a lot faster.

For example, the product I’m working on is an email service. If a customer is without email, it’s unacceptable for my team to take this long to validate a bug fix. Executing just the highest-priority tests to validate a hot-fix may be a valid compromise. If you have a lot of test cases, however, even this can take too long.

3. Tests take too much effort to maintain.

When your automation suffers from test case bloat, even subtle changes in product functionality can cause massive ripples in your existing test cases, drastically increasing the amount of time you spend maintaining them. This leaves little time for other, more valuable tasks, such as testing new features. It’s also a morale killer. Most testers I know— the really good ones, at least— don’t want to continually maintain the same test cases. They want to test new features and write new code.

4. After a certain threshold, more test cases no longer uncover product bugsthey mask them.

Most test cases only provide new information the first time they’re run. If the test passes, we can assume the feature works. If the test fails, we file a bug, which is eventually fixed by development, and the test case begins to pass. If it’s written well, the test will continue to pass unless a regression occurs.

Let’s assume we have 25 test cases that happily pass every time they’re run. At 3:00 a.m. an overtired developer then checks in a bug causing three tests to fail. Our pass rate would drop from 100% to an alarming 88%. The failures would be quickly investigated, and the perpetrator would be caught. Perhaps we would playfully mock him and make him wear a silly hat.

But what if we had 50 test cases? Three failures out of 50 test cases is a respectable 94% pass rate. What about a hundred or two hundred tests? With this many tests, it’s now very possible that there are some amount of failures in every pass simply due to test code problems; timing issues are a common culprit. The same three failures in two hundred tests is a 99% pass rate. But were these failures caused by expected timing issues, or a real product bug? If your team was pressed to get a hot-fix out the door to fix a live production issue, it may not investigate a 99% pass rate with as much vigor as an 88% pass rate.

Bloat Relief

If your automation suffers from test case bloat, you may be able to refactor your tests. But you can’t simply mash four or five tests with different validation points into a single test case. The more complicated a test, the more difficult it becomes to determine the cause and severity of failure.

You can, however, combine test cases when your validation points are similar, and the severity of a failure at each validation point is the same. For example, if you’re testing a UI dialog, you don’t need 50 different test cases to validate that 50 objects on the screen are all at their expected location. This can be done in one test.

You can also combine tests when you’re checking a single validation point, such as a database field, with different input combinations. Don’t create 50 different test cases that check the same field for 50 different data combinations. Create a single test case that loops through all combinations, validating the results.

When my test pass was taking 27 hours to complete, one solution we discussed was splitting the pass based on priority, feature, or some other criteria. If we had split it into three separate test passes, each would have taken only nine hours to finish. But this would have required three times as many servers. That may not be an issue if your test pass runs on a single server or virtual machines, however I’ve worked on automation that required more than twenty physical servers–tripling your server count is not always an option.

In addition to the techniques discussed above, pair-wise testing and equivalence class partitioning are tools that all testers should have in their arsenal. The ideal solution, however, is to prevent bloating before it even starts. When designing your test cases, it’s important to be aware of the number of tests you’re writing. If all else fails, I hear you can gain time by investigating your test failures while travelling at the speed of light.