UI Testing Checklist

When testing a UI, it’s important to not only validate each input field, but to do so using interesting data. There are plenty of techniques for doing so, such as boundary value analysis, decision tables, state-transition diagrams, and combinatorial testing. Since you’re reading a testing blog, you’re probably already familiar with these. Still, it’s still nice to have a short, bulleted checklist of all the tools at your disposal. When I recently tested a new web-based UI, I took the opportunity to create one.

One of the more interesting tests I ran was a successful HTML injection attack. In an input field that accepted a string, I entered: <input type=”button” onclick=”alert(‘hi’)” value=”click me”>. When I navigated to the web page that should have displayed this string, I instead saw a button labeled “click me”. Clicking on it produced a pop-up with the message “hi”.  The web page was rendering all HTML and JavaScript I entered. Although my popup was fairly harmless, a malicious user could have used this same technique to be, well, malicious.

inject

 

 

 

 

 

 

Another interesting test was running the UI in non-English languages. Individually, each screen looked fine. But when I compared similar functionality on different screens, I noticed some dates were formatted mm/dd/yyyy and others dd/mm/yyyy. In fact, the most common bug type I found was inconsistencies between screens. The heading on some pages were name-cased, while others were lower-cased. Some headings were butted against the left size of the screen, and others had a small margin. Different fonts were used for similar purposes.

Let’s get back to boundary value analysis for a minute. Assume you’re testing an input field that accepts a value from 1 to 100. The obvious boundary tests are 0, 1, 100, and 101. However, there’s another, less obvious, boundary test. Since this value may be stored internally as an integer, a good boundary test is a number too large to be stored as an int.

My UI Testing Checklist has all these ideas, plus plenty more: accented characters, GB18030 characters, different date formats, leap days, etc.. It’s by no means complete, so please leave a comment with anything you would like added. I can (almost) guarantee it’ll lead you to uncover at least one new bug in your project.

 

Click this images to download the UI Testing Checklist

Click this image to download the UI Testing Checklist

Continue reading

Instagram for Mongo

When Instagram finally released an app for Windows phone­, a year after its release on iPhone and Android, it was missing perhaps its most important feature — the ability to take a picture within the app. Instagram explained they wanted to get the app to users as quickly as possible, and although a few features were missing, they assured everyone these features would come in a future release. Welcome to the new paradigm of shipping software to the cloud.

When I started in Test, we shipped our products on DVD. If a feature was missing or broken, the customer either had to wait for a new DVD (which might take a year or three) or download and install a service pack. Today, our software is a service (Saas). We ship to the cloud, and all customers are upgraded at once. For the Instagram app, the software is upgraded in the cloud, customers are automatically notified, and they can upgrade their phone with a single click.

Mongo likes Instagram.

Mongo likes Instagram.

In both cases, updating software is easier than ever. This allows companies to get their products to market quicker than ever. There’s no longer a need to develop an entire product before shipping. You can develop a small feature set, possibly with some known (but benign) bugs, and then iterate, adding new scenarios and fixing bugs.

In How Google Tests Software, James Whittaker explains that Google rarely ships a large set of features at once. Instead, they build the core of the product and release it the moment it’s useful to as many customers as possible. Whittaker calls this the “minimal useful product”. Then they iterate on this first version. Sometimes Google ships these early versions with a Beta tag, as they did with Android and Gmail, which kept its Beta tag for 4 years.

When I first read Whittaker’s book, I had a hard time accepting you should release what’s essentially an unfinished product. But I hadn’t worked on a cloud service yet, either. Now that I’ve done so for a few years, I’m convinced this is the right approach. This process can benefit everyone affected by the product, from your customers, to your management, to your (wait for it) self.

Customers

  1. By shipping your core scenarios as soon as they’re ready, customers can take advantage of them without waiting for less important scenarios to be developed.

    I worked on a project where the highest priority was to expose a set of data in our UI. We completed this in about two weeks. We knew, however, some customers needed to edit this data, and not just see it. So we decided to ship only after the read/write functionality was complete. Competing priorities and unexpected complications made this project take longer than expected. As a result, customers didn’t get their highest-priority scenario, the ability to simply view the data, for 2 months, rather than 2 weeks.

  2. When a product is large and complex, scenarios can go untested due to either oversight or test debt. With a smaller feature set, it’s less likely you’ll overlook scenarios. And with this approach, there shouldn’t be any test debt. Remember, Whittaker is not saying to release a buggy product with a lot of features, and then iterate fixing the bugs. He’s saying to ship a high-quality product with a small feature set, and iterate by adding features.
  3. Many SaaS bugs are due to deployment and scale issues, rather than functional issues. Using an iterative approach, you’ll find and address these bugs quickly because you’re releasing features quickly. By the time you’re developing the last feature, you’ll hopefully have addressed all these issues.
  4. Similarly, you’ll be able to incorporate customer feedback into the product before the last feature has even been developed.
  5. Customers like to get updates! On my phone, I have some apps that are updated every few weeks, and others that are updated twice a year. Sure, it’s possible the apps modified twice a year are actually getting bigger updates than the ones updated often, but it sure doesn’t feel that way.

Management

  1. This approach gets new features to market quicker, giving your company a competitive advantage over, well, your competitors.
  2. By releasing smaller, high-quality features, fewer bugs will be found by customers. And bugs reported by customers tend to be highly visible to, and frowned upon by, management.

You

  1. If you work on a cloud product, there’s undoubtedly an on-call team supporting it. You’re probably part of it. By releasing smaller feature sets with fewer bugs, the on-call team will receive fewer customer escalations, and be woken up fewer times at night. You’re welcome.

“I see test as a role going away – and it can’t disappear fast enough”

In case you missed it, Alan Page dropped a bombshell in his last post of 2013 — and then immediately (and literally) got on a plane and left. Definitely worth a read.

The Evolution of Software Testing

As I sat down to eat my free Microsoft oatmeal this morning, I noticed how dusty my Ship-It plaque had become. I hadn’t given it much attention lately, but today it struck me how these awards illustrate the evolution of software testing.

Since I started at Microsoft seven years ago, I’ve earned nine Ship-It awards — plaques given to recognize your contribution to releasing a product. The first Ship-It I “earned” was in 2006 for Microsoft Antigen, an antivirus solution for Exchange.

I put “earned” in quotes because I only worked on Antigen a couple of days before it was released. A few days into my new job, I found myself on a gambling cruise — with open bar! — to celebrate Antigen’s release. I was also asked to sign a comically over-sized version of the product DVD box. Three years later I received another Ship-It — and signed another over-sized box — for Antigen’s successor, “Forefront Protection 2010 for Exchange Server”.

Fast-forward to 2011, when I received my plaque for Microsoft Office 365. This time there was no DVD box to sign — because the product wasn’t released on DVD. It’s a cloud-based productivity suite featuring email and Office.

This got me thinking. In 2006, when we shipped Antigen, all the features we wanted to include had to be fully developed, and mostly bug-free, by the day the DVD was cut. After all, it would be another three years before we cut a new one. And it would be a terrible experience for a customer to install our DVD, only to then have to download and install a service pack to address an issue.

By 2011, however, “shipping” meant something much different. There was no DVD to cut. The product was “released” to the cloud. When we want to update Office 365, we patch it in the cloud ourselves without troubling the customer; the change simply “lights up” for them.

This means products no longer have to include all features on day one. If a low-priority feature isn’t quite ready, we can weigh the impact of delaying the release to include the feature, versus shipping immediately and patching it later. The same hold true if/when bugs are found.

What does this all mean for Test? In the past, it was imperative to meet a strict set of release criteria before cutting a DVD. For example, no more code churn, test-pass rates above 95%, code coverage above 65%, etc. Now that we can patch bugs quicker than ever, do these standards still hold? Have our jobs become easier?

We should be so lucky.

In fact, our jobs have become harder. You still don’t want to release a buggy product — customers would flock to your competitors, regardless of how quickly you fix the bugs. And you certainly don’t want to ship a product with security issues that compromise your customer’s data.

Furthermore, it turns out delivering to the cloud isn’t “free.” It’s hard work! Patching a bug might mean updating thousands of servers and keeping product versions in sync across all of them.

Testers now have a new set of problems. How often should you release? What set of bugs need to be fixed before shipping, and which ones can we fix after shipping? How do you know everything will work when it’s deployed to the cloud? If the deployment fails, how do you roll it back? If the deployment works, how do you know it continues to work — that servers aren’t crashing, slowing down, or being hacked? Who gets notified when a feature stops working in the middle of the night? (Spoiler alert: you do.)

I hope to explore some of these issues in upcoming articles. Unfortunately, I’m not sure when I’ll have time to do that. I’m on-call 24×7 for two weeks in December.

Equivalence Class Testing

When testing even a simple application, it’s often impossible, or at least impractical, to test with all possible inputs. This is true whether you’re testing manually or using automation; throwing computing power at the problem doesn’t usually solve it. The question is, how do you find those inputs that give the most coverage in the fewest tests.

Assume we have a UI with just two fields for creating a user account. User Name can be between 1 and 25 characters, but can’t contain a space. Sex can be either Male or Female.

To test all possible inputs, we would first test single-character names using all possible characters. Then test two-character names using all possible characters in the first position, along with all possible characters in the second position. This process would be repeated for all lengths between 1 and 25. We then need to repeat all these names for Males and Females. I’m no mathematician, but this works out to be close to a kagillion possible inputs.

If this simple UI can have so many inputs, then it’s clear we’ll usually have to test with a very small subset of all possible data. Equivalence classes (ECs) give us the guidance we need to select these subsets.

ECs are subsets of input data, where testing any one value in the set should be the same as testing the others. Here we have two equivalence classes of valid input data.

Field Description
1. User Name Length >=1 and <=25 not containing a space
2. Sex Male or Female

Since we have more than one EC of valid data, we can create a single test that randomly picks one value from each. After all, testing with A/Male should give us the same result as testing with AB/Female or JKHFGKJkjgjkhg/Male. In all three cases, a new user account should be created.

It’s also important to create classes of invalid input. If you have an EC of valid data in the range of 1-10, you should also an EC of invalid data less than 1 and an EC of values greater than 10. We can create five equivalence classes of invalid input for our sample app.

Field Description
3. User Name Empty
4. User Name Length > 25 (no spaces)
5. User Name Length > 25 (containing a space)
6. User Name Length >1 and <=25 containing a space
7. Sex Neither Male nor Female

When designing your tests, it’s important to only pick one invalid value per test case; using more than one can lead to missed product bugs. Suppose you choose a test with both an invalid User Name and an invalid Sex. The product may throw an error when it detects the invalid User Name, and never even get to the logic that validates the Sex. If there are bugs in the Sex-validation logic, your test won’t catch it.

Using our ECs, we’d create the following tests.

  1. Valid name from #1 and valid Sex from #2
  2. Invalid name from #3 and valid Sex from #2
  3. Invalid name from #4 and valid Sex  from #2
  4. Invalid name from #5 and valid Sex from #2
  5. Invalid name from #6 and valid Sex from #2
  6. Valid name from #1 and invalid Sex from #7

Equivalence classes have reduced our number of tests from a kagillion to six. As Adam Sandler might say, “Not too shabby.”

Not only does EC testing make sure you don’t have too many tests, it also ensures you don’t have too few. Testers not familiar with equivalence classes might test this UI with just four pairs of hardcoded values:

  1. John/Male
  2. Jane/Female
  3. John Smith/Male
  4. Joan/Blah

When these tests are automated and running in your test pass, they validate the same values day after day. Using equivalence class testing with random values, we validate different input every day, increasing our odds of finding bugs.

The day I wrote this article I found a UI bug that had been in the product for several months. The UI didn’t correctly handle Display Names of more than 64 characters. Sure enough, the UI tests didn’t use ECs, and tested with the same hardcoded values every day. If they had chosen a random value from an EC, the test would have eventually found this bug.

Another benefit of equivalence class testing is that it leads us Boundary value testing. When an equivalence class is a range, we should create tests at the high and low edge of the range, and just outside the range. This helps us find bugs in cases when the developer may have used a > instead of >=.

Boundary tests, however, should not take the place of your EC tests. In our example, don’t hardcode your EC test to have a User Name length of 25. Your EC tests should choose a random number between 1 and 25. You should have a separate Boundary tests that validate length 25.

And don’t forget to create tests for values far outside the range. The size of your variable is a type of EC class as well! This could lead to other bugs such as exceeding the max length of a string, or trying to shove a long number into an int.

When defining your ECs, the first place to look is the Feature Specification Document (or whatever your Agile equivalent may be.)  The other place is in the product code. This, however, can be risky. If the source code has a statement like if x> 1 && x<5, you would create your EC as 2,3,4, and your tests will pass. But how do you know the source code was correct? Maybe it was intended to be x>=1 && x<=5. That’s why you should always push for valid input rules to be defined in a spec.

Another technique for creating ECs is to is to break up the output into equivalence classes. Then choose inputs that give you an output from each class. In our example, assume the user account can be stored in one of two databases depending on whether the User Name contains Chinese characters. In this case, we would need one EC of valid Chinese User Names and one of valid non-Chinese User Names.

PS – Not only is this article a (hopefully) interesting lesson on equivalence classes, it’s also an interesting experiment for the Expert Testers blog. In this article, I used the word Sex 15 times. I’m curious if it gets more traffic than usual because of it. If it does, you can look forward to my next article, “The Naked Truth – Pairwise Testing Exposed!”

%d bloggers like this: