UI Testing Checklist

When testing a UI, it’s important to not only validate each input field, but to do so using interesting data. There are plenty of techniques for doing so, such as boundary value analysis, decision tables, state-transition diagrams, and combinatorial testing. Since you’re reading a testing blog, you’re probably already familiar with these. Still, it’s still nice to have a short, bulleted checklist of all the tools at your disposal. When I recently tested a new web-based UI, I took the opportunity to create one.

One of the more interesting tests I ran was a successful HTML injection attack. In an input field that accepted a string, I entered: <input type=”button” onclick=”alert(‘hi’)” value=”click me”>. When I navigated to the web page that should have displayed this string, I instead saw a button labeled “click me”. Clicking on it produced a pop-up with the message “hi”.  The web page was rendering all HTML and JavaScript I entered. Although my popup was fairly harmless, a malicious user could have used this same technique to be, well, malicious.








Another interesting test was running the UI in non-English languages. Individually, each screen looked fine. But when I compared similar functionality on different screens, I noticed some dates were formatted mm/dd/yyyy and others dd/mm/yyyy. In fact, the most common bug type I found was inconsistencies between screens. The heading on some pages were name-cased, while others were lower-cased. Some headings were butted against the left size of the screen, and others had a small margin. Different fonts were used for similar purposes.

Let’s get back to boundary value analysis for a minute. Assume you’re testing an input field that accepts a value from 1 to 100. The obvious boundary tests are 0, 1, 100, and 101. However, there’s another, less obvious, boundary test. Since this value may be stored internally as an integer, a good boundary test is a number too large to be stored as an int.

My UI Testing Checklist has all these ideas, plus plenty more: accented characters, GB18030 characters, different date formats, leap days, etc.. It’s by no means complete, so please leave a comment with anything you would like added. I can (almost) guarantee it’ll lead you to uncover at least one new bug in your project.


Click this images to download the UI Testing Checklist

Click this image to download the UI Testing Checklist

Continue reading

Testing Is Like Going To The Doctor


By now, you’ve probably heard the phrase “move quality upstream.” The idea is that we want developers to take more ownership of the validating the quality of the code they produce. There are a couple of common sense practices–most of which you’ve already heard of–that you need to adopt to start moving your org in this direction.

The first practice you need to adopt is unit testing. If your org doesn’t embrace unit testing, this is the first thing you need to go fix. Fortunately there’s been a lot written about unit testing and about how to get a dev org to adopt unit testing as a best practice. There’s even an IEEE standard on how to approach unit testing, if you’re into that kind of thing.

The next practice you need to adopt is test automation. Everything I write is going to assume that you believe test automation is generally a good thing and that you’ve got a standard test harness that you can use to exercise the code under test. Maybe you bought a test harness off the shelf. Maybe you’re using an open source test framework. Maybe you’re a special, unique snowflake and your org has a test harness that’s completely internal. Whatever. The point is, I’m going to assume that you believe in writing tests that are automated, repeatable and maintainable. I’m also going to assume you have some automated sanity tests, and probably an automated way to build and deploy code. If you haven’t got these yet, there’s some great books and blogs out there that’ll help you get the job done.

Great. What’s Next?

OK. So you’ve adopted both unit testing and automated functional testing. You’ve got an automated build and an automated BVT system that tells you when the build is just plain broken. That’s great. The next thing you need to think about is how to move more of the functional testing of software into the hands of the dev that’s writing it.

It makes sense for dev to own at least some functional testing. If you think about it, every time we move code from a developer to a tester we’re introducing overhead. It’s like having to setup and tear down a stack during a function call, or like having to move from “on box” to “on rack” in a cloud computing environment. It’s a tiny cost that repeated hundreds of times add up to a really big cost. The problem is that functional testing is a really big problem space. Are we asking dev to take on basic happy path testing? What about pairwise or stress testing? What about negative testing?

Yes. Yes, to all of it. But we’re going to do it one piece at a time. And when we’re done, we’re still going to have a ton of other stuff to do as quality assurance professionals. It’ll just be different stuff than what we do today.

The first thing you need to ask yourself is, “What is the first thing I do when dev hands me a piece of code to test?” The next thing you need to ask yourself is, “Does this really need to be done at all?” Then ask, “If this needs to be done, am I really the right person to do it?”

The first thing that you do when dev hands you a piece of code is going to be different depending on the kind of software you’re shipping. If you own a web service, you might validate that basic browsing and payment processing are working. If you own a game, you might make sure the menus load. The thing is, there’s probably something that you always do first when a dev tells you that a piece of code is done.

That thing, whatever it is–do you really need to do it? Is it finding bugs? What’s the risk to your product if you don’t find a bug at this stage? If a test doesn’t find bugs and it doesn’t mitigate risk, it’s probably not worth running. If a test always finds bugs, then we need to do something to improve quality because consistent failure is a sign that something’s systemically wrong.

Here’s a table to illustrate the point:

Always find Bugs Rarely Finds Bugs
High Risk We need more quality before we run this test Somebody should run this test
Low Risk We need more quality before we run this test Don’t run the test

Is There A Doctor In The House?

A test in software is kind of like getting a test from the doctor. If the doctor finds that my blood pressure is high, I’m the one who gets on the treadmill to try to get my blood pressure down, not the doctor. If the tests find that something’s consistently wrong, the the developer needs to do something to rectify the consistent failure: more code review, more unit testing–or maybe running the functional test themselves.

If the doctor thinks that it’s worthwhile to have my blood pressure monitored, I could come into the office to do it but that’ll get expensive and time consuming really fast. Or I could buy a blood pressure cuff and monitor my vitals myself, which will be much cheaper. The automated test that you’ve been running every time your dev hands you code? That test is your automatic, home-use blood pressure cuff.

The key here is that the home blood pressure monitor is automatic. I push a button, it squeezes my arm like an anaconda, and it spits out a reading. I don’t need a stethoscope. I don’t even need to know how to spell “systolic”. If somebody hadn’t invented this nifty little automated testing device, I wouldn’t be able to do this test by myself. But they did, and I can, and it saves a ton of time and money.

So if your test is always finding bugs or if the area is high risk, take your automated test and say to your dev partner, “Hey, I run this test every time you hand me code. If you ran it instead it would save us both some time.”

I’m assuming, of course, that your automated test runs pretty quickly, doesn’t generate false failures, and don’t require mastery of the Dark Arts to setup and execute–just like the automated blood pressure cuff. As long as you meet these conditions with your tests, the win for everybody is usually pretty obvious. It’s exactly the same as not driving to the doctor when you have a home blood pressure cuff–don’t go to the tester when you’ve got a quick, easy way of validating quality yourself.

A quick coverage of Code Coverage

Testing is full of numbers:

  • How long will the test pass take?
  • What percentage of the features have you tested?
  • What is the automation test pass rate?
  • How confident are we that the failing tests are real product failures and not failures of the test system?
  • What is my raise going to be?

Code Coverage is just a number.  It tells us how much of the code has been exercised, and maybe verified, by our testing effort.  This is also sometimes called White Box testing since we look at the code in order to develop our test cases.  Management sometimes puts a high value on the code coverage number.  Whether they should or not is a discussion best left to each company.  There are multiple ways we can get code coverage numbers.  Here are three examples.

Block testing

Definition: Execute a contiguous block of code at least once

Block testing is the simplest first order method to obtain a code coverage number.  The strength is it’s quick.  The weakness is it’s not necessarily accurate.  Take a look at this code example:

bool IsInvalidTriangle(ushort a, ushort b, short c)
bool isInvalid;
if ((a + b <= c) || (b + c <= a) || (a + c <= b))
        isInvalid = true;
return isInvalid;

If we tested it with the values of a=1, b=2, and c=3; we would get a code coverage of about 80%.  Great, management says, SHIP IT!  Wait, you say, there is a weakness of block level testing.  Can you spot it?  The one test case only hits the first condition of the IF statement.  Block level testing will report the line as 100% covered, even though we did not verify the second and third conditions.  If one of the expressions was “<” instead of “<=” we would never catch the bug.

Condition testing

Definition: Make every sub-expression of a predicate statement evaluate to true and false at least once

This is one step better than block level testing since we validate each condition in a multiple condition statement.  The trick is to break any statement with multiple conditions to one condition per line, and then put a letter in front of each condition.  Here is an example:

void check_grammar_if_needed(const Buffer& buffer)
A:  if (enabled &&
B:      (buffer.cursor.line < 10) &&
C:      !buffer.is_read_only)

Our tests would be:

Test  enabled    value of ‘line’   is_read_only   Comment
1 False  N/A  N/A
2 True 11  N/A A is   now covered
3 True 9 True B is   now covered
4 True 9 False C is   now covered

Breaking the conditions into one per line doesn’t really help much here.  This trick will help if you have nested loops.  You can set up a table to help make sure each inner expression condition is tested with each outer expression condition.

Basis Path testing

Definition: Test C different entry-exit paths where C (Cyclomatic complexity) = number of conditional expressions + 1

Does the term “Cyclomatic complexity” bring back nightmares of college?  Most methods have one entry and one or two exits.  Basis Path testing is best applied when there are multiple exit points since you look at each exit path in order to determine your code coverage.  The steps you follow to find the basis paths (shortest path method):

  • Find shortest path from entry to exit
  • Return to algorithm entry point
  • Change next conditional expression or sub-expression to alternate outcome
  • Follow shortest path to exit point
  • Repeat until all basis paths defined

Here is an example:

A:  static int GetMaxDay(int month, int year)
    int maxDay = 0;
B:       if (IsValidDate(month, 1, year))    {
C:         if (IsThirtyOneDayMonth(month))     {
    maxDay = 31;
D:      else if (IsThirtyDayMonth(month))    {
    maxDay = 30;
    else    {
    maxDay = 28;
E:          if (IsLeapYear(year))    {
    maxDay = 29;
    return maxDay;
F:       }

Test cases:

Branch to flip  Shortest path out        Path Input
n/a B==false ABF 0, 0
B B==true,   C==true ABCF 1,1980
C B==true,   C==false, D==true ABCDF 4,1980
D B==true,   C==false, D==false, E==false ABCDEF 2,1981
E B==true,   C==false, D==false, E==true ABCDEF 2,1980

These are just three of the many different ways to calculate code coverage.  You can find these and more detailed in any decent book on testing computer software.  There are also some good references online.  Here is one from a fellow Expert Tester.  As with any tool, you the tester have a responsibility to know the benefits and weaknesses of the tools you use.

Thankfully, most compilers will produce these numbers for us. Code Coverage goals at Microsoft used to be around 65% code coverage using automation.  For V1 of OneNote, I was able to drive the team and get it up to 72%.  Not bad for driving automation for a V1 project.  With the move from boxed products to services, code coverage is getting less attention and we are now looking more into measuring feature and scenario coverage.  We’ll talk about that in a future blog.

Now, what will we tell The Powers That Be?

The key to unlock the tests is in the combination

In the last blog, Andrew Schiano discussed Equivalence Class (EQ) and Boundary Value Analysis (BVA) testing methodologies.  This blog will talk about how to extend those two ideas even further with Combinatorial Testing.

Combinatorial Testing is a form of model-based testing.  It chooses pairs or sets of inputs, out of all of the possible inputs, that will give you the best coverage with the least cost.  Fewer test cases while still finding bugs and giving high code coverage is a dream of us testers.  It is best applied when:

  • Parameters are directly interdependent
  • Parameters are semi-coupled
  • Parameter input is unordered

Let’s look at an example UI.  You have to test a character formatting dialog.  It allows you to pick between four fonts, two font styles, and three font effects.  A chart of the values looks like this:

Field Values
Font Arial, Calibri,Helvetica, BrushScript
Style Bold, Italic
Effects Strikethrough, Word Underline, Continuous Underline

For any selection of text, you can have only one Font, zero to two Styles, and zero to two effects.  Notice how for effects, you could select one pair, strikethrough and continuous underline, but you should not be able to pick another pair, word underline and continuous underline.  You could set up a matrix of tests and check each combination.  That’s old school.  You could reduce the number of cases by applying EQ.  That would be smarter. Your best bet, though, is to apply Combinatorial testing.

Combinatorial testing, also sometimes called Pairwise Testing, is just modeling the System Under Test (SUT).  You start taking unique interesting combinations of inputs, using them to test the system, and then feeding that information back into the model to help direct the next set of tests.  There is mathematics involved that take into account the coupling (interactions) of the inputs, which we won’t cover here.  Thankfully there are tools that help us choose which test cases to run.   I’ll cover the Pairwise Independent Combinatorial Testing (PICT) tool, available free from Microsoft.  Since it is a tool, it is only as good as the input you give it.

The steps to using PICT or any other combinatorial testing tool are:

  1. Analysis and feature decomposition
  2. Model parameter variables
  3. Input the parameters into the tool
  4. Run the tool
  5. Re-validate the output
  6. Modify the model
  7. Repeat steps 4-6

In our example above, the decomposition would look like this:

  • Font: Arial, Calibri, Helvetica, BrushScript
  • Bold: check, uncheck
  • Italic: check, uncheck
  • Strikethrough: check, uncheck
  • Underline: check, uncheck, word, continuous

You feed this data into the tool and it will output the number of tests you specify.  You need to validate the case before running them.  For example, the BrushScript font only allows Italic and Bold/Italic.  If the tool output the test case:

  • Font: BrushScript
  • Bold: check
  • Italic: uncheck
  • Strikethrough: check
  • Underline:  word

Being the awesome tester that you are, you would notice this is not valid.  Thankfully the PICT tool allows you to constrain invalid input combinations.  It also allows you to alias equivalent values.  So, you modify the model, not the outputted values.  In this case, you would add two line so the input file would now look something like this:

Font: Arial, Calibri, Helvetica, BrushScript

Bold: check, uncheck

Italic: check, uncheck

Strikethrough: check, uncheck

Underline: check, uncheck, word, continuous

IF [Font] = “BrushScript” AND [Italic] = “uncheck” THEN [Bold] <> “check”;

IF [Bold] = “uncheck” AND [Italic] = “uncheck” THEN NOT [Font] = “BrushScript”;

The PICT tool also allows you to weight values that are more common (who really uses BrushScript anymore?), and seed data about common historical failures.

Font: Arial(8) , Calibri(10), Helvetica(5), BrushScript (1)

Does this really work?  Here is an example from a previous Microsoft project:

Command line program with six optional arguments:

Total Number of   Blocks = 483
Default test suite
Exhaustive   coverage
Pairwise coverage
Number of test   cases
Blocks covered
Cove coverage
Functions not   covered

Now, pair this with automation so you have data-driven automated testing, and you’re REALLY off and running as a twenty-first century tester!

A few words of caution.  While this gives you the minimum set of tests, you should also test:

  1. Default combinations
  2. Boundary conditions
  3. Values known to have caused bugs in the past
  4. Any mission-critical combinations.

Lastly, don’t forget Beizer’s Pesticide Paradox.  Keep producing new test cases.  If you only run the tool once, and continually run those same cases, you’re going to miss bugs.

There’s a Smart Monkey in my toolbelt

This is a follow on to Andrew’s article “State-Transition Testing.”  If your software can be described using states, you can use monkey automation to test your product.  While smart monkeys can take a little time to implement, even using third-party automation software, their payback can be enhanced if you release multiple versions, follow a rapid release cadence, or are short on testing resources.

Let me start by noting that Smart Monkey testing differ from Netflix’s Simian Army.  Similar name, both are code testing code.  Other than that, they are different.

In general, monkey testing is automated testing with no specific purpose in mind other than to test the product or service.  It’s also known as “stochastic testing” and is a tool in your black box tester tool belt.  Monkey testing comes in two flavors:

Dumb Monkey testing is when the automation randomly sends keystrokes or commands to the System Under Test (SUT).  The automation system or tester monitors the SUT for crashes or other incorrect reactions.  One example would be feeding random numbers into a dialog that accepts a dollar value.  Another example comes from when I was a tester in Office.  One feature I was testing is the ability to open web pages in Microsoft Word.  I wrote a simple macro in Visual Basic for Applications (VBA) to grab a random web page from the web browser, log the web address to a log file, and try to open the web page in Word.  If Word crashed, my macro would stop, and I would know that I hit a severity one (i.e. crashing) bug.  I can watch my computer test for me all day…or just run it at night.  Free bugs!

Funny Aside: I had a little code in the macro to filter out, um, inappropriate web sites.  Occasionally one got through the code, an alarm would go off in our IT department, and my manager would receive an email that I was viewing sites against HR’s policy.  It happened every few months.  I would show that it was the automation, not me, doing the offensive act.  We would laugh about it, put a letter of reprimand in the monkey’s employee file, and move on.

Smart Monkey testing is, well, smarter.  Instead of doing random things, the automation system knows something about your program:

  • Where it is
  • What it can do there
  • Where it can go and how to get there
  • Where it’s been
  • If what it’s seeing is correct

This is where State Transition comes into play.

Let’s look at a State Transition diagram for a Learning Management System (LMS) and how the enrollment status of a student could change.

LMS State Diagram

You would define for the smart monkey automation the different states, which states changes are possible, and how to get to those.  For the diagram above, it might look something like this:

Where it is: Registered

What it can do there #1: Learner can Cancels their registration

What it can do there #2: Learner can Attend the class

Where it is: Canceled

What it can do there #1: Learner can Register for the class

What it can do there #2: Learner can be added to the Waitlist

Where it is: Waitlisted

What it can do there #1: Learner can Cancel their registration

What it can do there #2: The learner will be automatically registered by the system if a spot opens for them

You get the general idea.  This still looks like Andrew’s State-Transition Testing.  What is different is the automation knows this information.  When you start the automation, it will start randomly traversing the states.  Sometimes it will follow an expected path:

Register | Attend

The next time, it might try a path you didn’t expect (the learner ignores their manager and attends anyways):

Request | Request Declined | Walk-in

It might also do things like request and cancel the same class fifty times.

Don’t have time to define all of the states?  Some third party software will randomly explore your software and create the state model and how to traverse it for you. You can then double-check the model it create, make any corrections or additions, and you are on your way!

How can you improve this system?

  1. Have the monkey access an XML or other data file with specific cases you would like hit more than the random ones.  For example, you could use the PICT tool to create a list of the most interesting combinations of inputs to hit.
  2. You can also make this system smarter by assigning probabilities to each state change.  Find out how often the user actually cancels a registration.  Now feed that data into the smart monkey.  Now your monkey will test the state changes at the same frequency as the real user.
  3. The next step?  Tie your monkey into live customer data: Customer data-driven quality (CDDQ).  For example, let’s say all of a sudden your customers start canceling a lot of class registrations due to an upcoming holiday.  Your smart monkey will automatically start testing the cancel registration state change more often.

The whole idea of smart monkey testing is it will follow expected and unexpected paths.  You can run the monkey on spare machines, on your test machine overnight, pretty much anytime; and give you free testing.  If your logging is good enough and tracks the state, and which transition path it followed, you will be able to reproduce any bugs it finds.  Watch your code coverage numbers go up…maybe.  But that’s fodder for another posting.

Long live the smart monkey!

%d bloggers like this: