Why NOT to fix a bug

Us testers love to have our issues/bugs fixed, especially Sev 1 (i.e. crashing or data loss) ones. Sometimes we love it when they DON’T fix a bug. Say what? Yes, I fought to NOT fix a crashing bug. But I’m getting ahead of myself.

Whenever we find a bug, we assign a number to it denoting the severity of the bug. Maybe it’s trivial issue and the customer would likely never notice it. Maybe it’s a must-fix bug such as a crash, data loss, or security vulnerability. At Microsoft, we generally assign all bugs two numbers when we enter it: Severity and Priority. Severity is how bad the bug is: Crash = 1, a button border color off by a shade = 4. Priority is how soon the bug should be fixed: Search does nothing so I can’t test my feature = 1, Searching for ESC-aped text doesn’t work = 4.

Once we enter a bug, then it’s off to Bug Triage. Bug Triage is a committee made up of representatives from most of the disciplines. At the start of a project, there is a good chance all bugs will be fixed. We know, though, based on data mining our engineering process data, that whenever a bug is fixed, there is a non-zero chance that the fix won’t be perfect or something else will be broken. Early on in the project, we have time to find those new bugs. As we get closer to release, there may not be time to find those few cases where we broke the code.

One more piece to this puzzle: Quality Essentials (QE). It is a list of the practices and procedures – the requirements – that our software or service must meet in order to be released. It could be as simple as verifying the service can be successfully deployed AND rolled back. It could be as mundane as zero-ing out the unused portions of sectors on the install disk.

Now, that bug I told you about at the beginning. We have an internal web site that allows employees to search for and register for trainings. We had a sprint, a four week release cycle, at the end of the year where we had to make the site fully accessible to those with disabilities. This was a new QE requirement. We were on track for shipping on time…as long as we skipped our planned holiday vacations. While messing around with the site one lunch, I noticed that we had a SQL code injection bug. I could crash the SQL backend. The developer looked at the bug and the fix was fairly straight forward. The regression testing required, though, would take a couple of days. That time was not in the schedule. Our options were:
• Reset the sprint, fix the new bug, and ship late. We HAD to release the fix by the end of the year, so this wasn’t an option.
• Bring in more testing resources. With the holiday vacations already taking place, this wasn’t a really good option.
• Take the fix, do limited testing, and be ready to roll back if problems were found. Since this site has to be up 99.999%, this wasn’t a legitimate option.
• Not fix the bug. This is the option we decided to go with.

Why did we go with the last option? There were a couple of reasons:
1) The accessibility fix HAD to be released before the end of the year due to a Quality Essentials requirement.
2) The SQL backend was behind a load balancer, with a second server and one standby. One SQL server was usually enough to handle the traffic.
3) The crashed SQL server was automatically rebooted and rejoined the load balancer within a minute or two, so the end user was unlikely to notice any performance issues.
4) The web site is internal only, and we expect most employees to be well behaved…the project tester, me, being the exception.

So, the likelihood of the crash was small, the results of the crash were small, so we shipped it. After a few days off, the next sprint, a short one, was carried out just to fix and regress this one bug. According to the server logs, the SQL server was crashed once between the holidays and the release of the fix. It was noted by our ever diligent Operations team. But, hey, I was testing the logging and reporting system. 🙂

I would be remiss if I didn’t add that each bug is different and must be examined as part of the whole system. The fix decision would have been very different if this were an external facing service, or something critical such as financial data was involved.

UI Testing Checklist

When testing a UI, it’s important to not only validate each input field, but to do so using interesting data. There are plenty of techniques for doing so, such as boundary value analysis, decision tables, state-transition diagrams, and combinatorial testing. Since you’re reading a testing blog, you’re probably already familiar with these. Still, it’s still nice to have a short, bulleted checklist of all the tools at your disposal. When I recently tested a new web-based UI, I took the opportunity to create one.

One of the more interesting tests I ran was a successful HTML injection attack. In an input field that accepted a string, I entered: <input type=”button” onclick=”alert(‘hi’)” value=”click me”>. When I navigated to the web page that should have displayed this string, I instead saw a button labeled “click me”. Clicking on it produced a pop-up with the message “hi”.  The web page was rendering all HTML and JavaScript I entered. Although my popup was fairly harmless, a malicious user could have used this same technique to be, well, malicious.

inject

 

 

 

 

 

 

Another interesting test was running the UI in non-English languages. Individually, each screen looked fine. But when I compared similar functionality on different screens, I noticed some dates were formatted mm/dd/yyyy and others dd/mm/yyyy. In fact, the most common bug type I found was inconsistencies between screens. The heading on some pages were name-cased, while others were lower-cased. Some headings were butted against the left size of the screen, and others had a small margin. Different fonts were used for similar purposes.

Let’s get back to boundary value analysis for a minute. Assume you’re testing an input field that accepts a value from 1 to 100. The obvious boundary tests are 0, 1, 100, and 101. However, there’s another, less obvious, boundary test. Since this value may be stored internally as an integer, a good boundary test is a number too large to be stored as an int.

My UI Testing Checklist has all these ideas, plus plenty more: accented characters, GB18030 characters, different date formats, leap days, etc.. It’s by no means complete, so please leave a comment with anything you would like added. I can (almost) guarantee it’ll lead you to uncover at least one new bug in your project.

 

Click this images to download the UI Testing Checklist

Click this image to download the UI Testing Checklist

Continue reading

The Tax Man

Introduction

As I write this, Tax Day in the United States is just behind us. This is the day when individuals are required by law to settle their tax bill with the US Federal Government, and in many cases with their state governments. Filing your taxes can get complicated, and income tax preparation is a large industry in the United States, with revenue around ten billion dollars according to IBISWorld. That’s ten billion dollars spent annually, helping people cope with the hassle, the complexity, and the general all around unpleasantness of paying taxes.

In the past, software testing could feel like paying taxes and being a software tester could start to feel like working for the IRS. There are forms to fill out. There are deadlines to be met. And there’s process, process, process and at the end of the day it feels like people only notice when something goes Horribly Wrong. But there’s a way to change all that.

Do you know how to get people to appreciate and understand paying taxes? Demonstrate a clear value for their investment, be flexible, and be data-driven.

Deliver Measurable Value

How many times this week did you run a test that didn’t find a bug in the last year? Be honest.

Running tests that don’t find bugs is not, generally, a great way to deliver value. It’s tempting to confuse activity with value and it’s nice to feel busy, but can you imagine explaining how you spent your time to the CEO of your company? It would go something like this:

CEO: So, what have you done this week?

You: Well, I ran a bunch of tests.

CEO: I see. Tell me more about how that improves our product or our profitaiblity?

You: Well, most of the tests I run don’t find bugs. We just run them because we’ve always run them.

[Cricket sounds]

In a conversation with your CEO, your boss, or even your mom you should be able to clearly and distinctly describe what you did to deliver value to the organization. Imagine this conversation instead:

CEO: So, what have you done this week?

You: I prevented thousands of dollars in revenue loss.

CEO: Pray tell, how did you to this?

You: I looked at the numbers from our website and I saw that traffic was down. I talked to Larry in development and found out that we’d just changed the font on the front page, about the same time traffic dropped. I worked with him and we put together an experiment where we tested the new font with some users and the old font with some other users. Turns out users hate the new font. We reverted the change and traffic’s gone up. We’ve put a process in place to do experimentation on all our big changes now.

CEO: I see. Tell me more about your ideas.

[Applause]

Be Flexible

In the past, we kept our tests in long documents, like giant Word files or long Excel spreadsheets. Then we created specialized software to manage the test cases. We had checklists. Lots of checklists.

The funny thing was, it could be possible to hit every box on the checklist and still deliver mediocre software. It was also possible for the checklist to slow down product development, without having anything measurable to show for it. How many times have you had to reject a change because the test process couldn’t accommodate it in time for your product deadline? Again, be honest. I bet you can think of at least one great change that couldn’t make it into your product because your change management system or your test pass couldn’t move fast enough to get the change in before the deadline.

There are some software development activities that require very rigorous, very cautious change management either because of regulatory requirements or because of risk to life or property. It’s very likely that you do not work in one of those areas. Even if you do there are probably places in your process that could stand some scrutiny and some adjustment.

At the end of the day you are not shipping your checklist. Go through the stages of your process and ask yourself if they’re really adding value, or if they’re only there because they’ve always been there. Then cut everything out of the process that doesn’t deliver real, measureable value to your users or your organization.

Be Data-driven

There are tons of tools available for getting data back from your application. Amazon has published a framework for doing A\B testing and analytics, and Facebook has Airlock. There are frameworks available for .NET, JavaScript, and pretty much any other platform that you need.

But, you protest, your product is written in COBOL and distributed via floppy disk to a small group of users in Thule, Greenland and McMurdo Base, Antarctica. There’s no way to be data-driven without writing my own framework from scratch and I don’t have time for that!

To this I say: horsefeathers. Being data-driven isn’t about using a framework about a buzzword. It’s about going out and looking at the real world (which is where your users live) and trying to understand how they’re using your software and what could change to make life better for them. If you don’t have a framework for doing A\B testing or getting user data back directly from your application, consider doing user interviews or surveys. The important thing is that you look at real users doing their real work and don’t just rely on results you got from your manual or automated testing.

To a certain extent, being data driven can be like answering a Fermi question. Start with a question that you need an answer to, for example “Do my users like the new font?” Then figure out what you already know and what tools you have at your disposal that can help you learn more. Frameworks are nice and they can make information easier to gather, but they aren’t the only way for you to go into the real world and get data from real users.

Conclusion

I don’t think there’s anybody who truly enjoys paying taxes. There are, however, tons of people who enjoy testing software. If you’re one of these people, and if you want your users and your dev peers to respect your work, then deliver measurable value, be flexible and be data-driven. If you don’t want to do these things, then don’t be surprised when your collleagues greet you with about as much enthusiasm as they greet an audit from the IRS.

Testing Is Like Going To The Doctor

Introduction

By now, you’ve probably heard the phrase “move quality upstream.” The idea is that we want developers to take more ownership of the validating the quality of the code they produce. There are a couple of common sense practices–most of which you’ve already heard of–that you need to adopt to start moving your org in this direction.

The first practice you need to adopt is unit testing. If your org doesn’t embrace unit testing, this is the first thing you need to go fix. Fortunately there’s been a lot written about unit testing and about how to get a dev org to adopt unit testing as a best practice. There’s even an IEEE standard on how to approach unit testing, if you’re into that kind of thing.

The next practice you need to adopt is test automation. Everything I write is going to assume that you believe test automation is generally a good thing and that you’ve got a standard test harness that you can use to exercise the code under test. Maybe you bought a test harness off the shelf. Maybe you’re using an open source test framework. Maybe you’re a special, unique snowflake and your org has a test harness that’s completely internal. Whatever. The point is, I’m going to assume that you believe in writing tests that are automated, repeatable and maintainable. I’m also going to assume you have some automated sanity tests, and probably an automated way to build and deploy code. If you haven’t got these yet, there’s some great books and blogs out there that’ll help you get the job done.

Great. What’s Next?

OK. So you’ve adopted both unit testing and automated functional testing. You’ve got an automated build and an automated BVT system that tells you when the build is just plain broken. That’s great. The next thing you need to think about is how to move more of the functional testing of software into the hands of the dev that’s writing it.

It makes sense for dev to own at least some functional testing. If you think about it, every time we move code from a developer to a tester we’re introducing overhead. It’s like having to setup and tear down a stack during a function call, or like having to move from “on box” to “on rack” in a cloud computing environment. It’s a tiny cost that repeated hundreds of times add up to a really big cost. The problem is that functional testing is a really big problem space. Are we asking dev to take on basic happy path testing? What about pairwise or stress testing? What about negative testing?

Yes. Yes, to all of it. But we’re going to do it one piece at a time. And when we’re done, we’re still going to have a ton of other stuff to do as quality assurance professionals. It’ll just be different stuff than what we do today.

The first thing you need to ask yourself is, “What is the first thing I do when dev hands me a piece of code to test?” The next thing you need to ask yourself is, “Does this really need to be done at all?” Then ask, “If this needs to be done, am I really the right person to do it?”

The first thing that you do when dev hands you a piece of code is going to be different depending on the kind of software you’re shipping. If you own a web service, you might validate that basic browsing and payment processing are working. If you own a game, you might make sure the menus load. The thing is, there’s probably something that you always do first when a dev tells you that a piece of code is done.

That thing, whatever it is–do you really need to do it? Is it finding bugs? What’s the risk to your product if you don’t find a bug at this stage? If a test doesn’t find bugs and it doesn’t mitigate risk, it’s probably not worth running. If a test always finds bugs, then we need to do something to improve quality because consistent failure is a sign that something’s systemically wrong.

Here’s a table to illustrate the point:

Always find Bugs Rarely Finds Bugs
High Risk We need more quality before we run this test Somebody should run this test
Low Risk We need more quality before we run this test Don’t run the test

Is There A Doctor In The House?

A test in software is kind of like getting a test from the doctor. If the doctor finds that my blood pressure is high, I’m the one who gets on the treadmill to try to get my blood pressure down, not the doctor. If the tests find that something’s consistently wrong, the the developer needs to do something to rectify the consistent failure: more code review, more unit testing–or maybe running the functional test themselves.

If the doctor thinks that it’s worthwhile to have my blood pressure monitored, I could come into the office to do it but that’ll get expensive and time consuming really fast. Or I could buy a blood pressure cuff and monitor my vitals myself, which will be much cheaper. The automated test that you’ve been running every time your dev hands you code? That test is your automatic, home-use blood pressure cuff.

The key here is that the home blood pressure monitor is automatic. I push a button, it squeezes my arm like an anaconda, and it spits out a reading. I don’t need a stethoscope. I don’t even need to know how to spell “systolic”. If somebody hadn’t invented this nifty little automated testing device, I wouldn’t be able to do this test by myself. But they did, and I can, and it saves a ton of time and money.

So if your test is always finding bugs or if the area is high risk, take your automated test and say to your dev partner, “Hey, I run this test every time you hand me code. If you ran it instead it would save us both some time.”

I’m assuming, of course, that your automated test runs pretty quickly, doesn’t generate false failures, and don’t require mastery of the Dark Arts to setup and execute–just like the automated blood pressure cuff. As long as you meet these conditions with your tests, the win for everybody is usually pretty obvious. It’s exactly the same as not driving to the doctor when you have a home blood pressure cuff–don’t go to the tester when you’ve got a quick, easy way of validating quality yourself.

Equivalence Class Testing

When testing even a simple application, it’s often impossible, or at least impractical, to test with all possible inputs. This is true whether you’re testing manually or using automation; throwing computing power at the problem doesn’t usually solve it. The question is, how do you find those inputs that give the most coverage in the fewest tests.

Assume we have a UI with just two fields for creating a user account. User Name can be between 1 and 25 characters, but can’t contain a space. Sex can be either Male or Female.

To test all possible inputs, we would first test single-character names using all possible characters. Then test two-character names using all possible characters in the first position, along with all possible characters in the second position. This process would be repeated for all lengths between 1 and 25. We then need to repeat all these names for Males and Females. I’m no mathematician, but this works out to be close to a kagillion possible inputs.

If this simple UI can have so many inputs, then it’s clear we’ll usually have to test with a very small subset of all possible data. Equivalence classes (ECs) give us the guidance we need to select these subsets.

ECs are subsets of input data, where testing any one value in the set should be the same as testing the others. Here we have two equivalence classes of valid input data.

Field Description
1. User Name Length >=1 and <=25 not containing a space
2. Sex Male or Female

Since we have more than one EC of valid data, we can create a single test that randomly picks one value from each. After all, testing with A/Male should give us the same result as testing with AB/Female or JKHFGKJkjgjkhg/Male. In all three cases, a new user account should be created.

It’s also important to create classes of invalid input. If you have an EC of valid data in the range of 1-10, you should also an EC of invalid data less than 1 and an EC of values greater than 10. We can create five equivalence classes of invalid input for our sample app.

Field Description
3. User Name Empty
4. User Name Length > 25 (no spaces)
5. User Name Length > 25 (containing a space)
6. User Name Length >1 and <=25 containing a space
7. Sex Neither Male nor Female

When designing your tests, it’s important to only pick one invalid value per test case; using more than one can lead to missed product bugs. Suppose you choose a test with both an invalid User Name and an invalid Sex. The product may throw an error when it detects the invalid User Name, and never even get to the logic that validates the Sex. If there are bugs in the Sex-validation logic, your test won’t catch it.

Using our ECs, we’d create the following tests.

  1. Valid name from #1 and valid Sex from #2
  2. Invalid name from #3 and valid Sex from #2
  3. Invalid name from #4 and valid Sex  from #2
  4. Invalid name from #5 and valid Sex from #2
  5. Invalid name from #6 and valid Sex from #2
  6. Valid name from #1 and invalid Sex from #7

Equivalence classes have reduced our number of tests from a kagillion to six. As Adam Sandler might say, “Not too shabby.”

Not only does EC testing make sure you don’t have too many tests, it also ensures you don’t have too few. Testers not familiar with equivalence classes might test this UI with just four pairs of hardcoded values:

  1. John/Male
  2. Jane/Female
  3. John Smith/Male
  4. Joan/Blah

When these tests are automated and running in your test pass, they validate the same values day after day. Using equivalence class testing with random values, we validate different input every day, increasing our odds of finding bugs.

The day I wrote this article I found a UI bug that had been in the product for several months. The UI didn’t correctly handle Display Names of more than 64 characters. Sure enough, the UI tests didn’t use ECs, and tested with the same hardcoded values every day. If they had chosen a random value from an EC, the test would have eventually found this bug.

Another benefit of equivalence class testing is that it leads us Boundary value testing. When an equivalence class is a range, we should create tests at the high and low edge of the range, and just outside the range. This helps us find bugs in cases when the developer may have used a > instead of >=.

Boundary tests, however, should not take the place of your EC tests. In our example, don’t hardcode your EC test to have a User Name length of 25. Your EC tests should choose a random number between 1 and 25. You should have a separate Boundary tests that validate length 25.

And don’t forget to create tests for values far outside the range. The size of your variable is a type of EC class as well! This could lead to other bugs such as exceeding the max length of a string, or trying to shove a long number into an int.

When defining your ECs, the first place to look is the Feature Specification Document (or whatever your Agile equivalent may be.)  The other place is in the product code. This, however, can be risky. If the source code has a statement like if x> 1 && x<5, you would create your EC as 2,3,4, and your tests will pass. But how do you know the source code was correct? Maybe it was intended to be x>=1 && x<=5. That’s why you should always push for valid input rules to be defined in a spec.

Another technique for creating ECs is to is to break up the output into equivalence classes. Then choose inputs that give you an output from each class. In our example, assume the user account can be stored in one of two databases depending on whether the User Name contains Chinese characters. In this case, we would need one EC of valid Chinese User Names and one of valid non-Chinese User Names.

PS – Not only is this article a (hopefully) interesting lesson on equivalence classes, it’s also an interesting experiment for the Expert Testers blog. In this article, I used the word Sex 15 times. I’m curious if it gets more traffic than usual because of it. If it does, you can look forward to my next article, “The Naked Truth – Pairwise Testing Exposed!”

%d bloggers like this: