Instagram for Mongo

When Instagram finally released an app for Windows phone­, a year after its release on iPhone and Android, it was missing perhaps its most important feature — the ability to take a picture within the app. Instagram explained they wanted to get the app to users as quickly as possible, and although a few features were missing, they assured everyone these features would come in a future release. Welcome to the new paradigm of shipping software to the cloud.

When I started in Test, we shipped our products on DVD. If a feature was missing or broken, the customer either had to wait for a new DVD (which might take a year or three) or download and install a service pack. Today, our software is a service (Saas). We ship to the cloud, and all customers are upgraded at once. For the Instagram app, the software is upgraded in the cloud, customers are automatically notified, and they can upgrade their phone with a single click.

Mongo likes Instagram.

Mongo likes Instagram.

In both cases, updating software is easier than ever. This allows companies to get their products to market quicker than ever. There’s no longer a need to develop an entire product before shipping. You can develop a small feature set, possibly with some known (but benign) bugs, and then iterate, adding new scenarios and fixing bugs.

In How Google Tests Software, James Whittaker explains that Google rarely ships a large set of features at once. Instead, they build the core of the product and release it the moment it’s useful to as many customers as possible. Whittaker calls this the “minimal useful product”. Then they iterate on this first version. Sometimes Google ships these early versions with a Beta tag, as they did with Android and Gmail, which kept its Beta tag for 4 years.

When I first read Whittaker’s book, I had a hard time accepting you should release what’s essentially an unfinished product. But I hadn’t worked on a cloud service yet, either. Now that I’ve done so for a few years, I’m convinced this is the right approach. This process can benefit everyone affected by the product, from your customers, to your management, to your (wait for it) self.

Customers

  1. By shipping your core scenarios as soon as they’re ready, customers can take advantage of them without waiting for less important scenarios to be developed.

    I worked on a project where the highest priority was to expose a set of data in our UI. We completed this in about two weeks. We knew, however, some customers needed to edit this data, and not just see it. So we decided to ship only after the read/write functionality was complete. Competing priorities and unexpected complications made this project take longer than expected. As a result, customers didn’t get their highest-priority scenario, the ability to simply view the data, for 2 months, rather than 2 weeks.

  2. When a product is large and complex, scenarios can go untested due to either oversight or test debt. With a smaller feature set, it’s less likely you’ll overlook scenarios. And with this approach, there shouldn’t be any test debt. Remember, Whittaker is not saying to release a buggy product with a lot of features, and then iterate fixing the bugs. He’s saying to ship a high-quality product with a small feature set, and iterate by adding features.
  3. Many SaaS bugs are due to deployment and scale issues, rather than functional issues. Using an iterative approach, you’ll find and address these bugs quickly because you’re releasing features quickly. By the time you’re developing the last feature, you’ll hopefully have addressed all these issues.
  4. Similarly, you’ll be able to incorporate customer feedback into the product before the last feature has even been developed.
  5. Customers like to get updates! On my phone, I have some apps that are updated every few weeks, and others that are updated twice a year. Sure, it’s possible the apps modified twice a year are actually getting bigger updates than the ones updated often, but it sure doesn’t feel that way.

Management

  1. This approach gets new features to market quicker, giving your company a competitive advantage over, well, your competitors.
  2. By releasing smaller, high-quality features, fewer bugs will be found by customers. And bugs reported by customers tend to be highly visible to, and frowned upon by, management.

You

  1. If you work on a cloud product, there’s undoubtedly an on-call team supporting it. You’re probably part of it. By releasing smaller feature sets with fewer bugs, the on-call team will receive fewer customer escalations, and be woken up fewer times at night. You’re welcome.
Advertisements

“I see test as a role going away – and it can’t disappear fast enough”

In case you missed it, Alan Page dropped a bombshell in his last post of 2013 — and then immediately (and literally) got on a plane and left. Definitely worth a read.

The Evolution of Software Testing

As I sat down to eat my free Microsoft oatmeal this morning, I noticed how dusty my Ship-It plaque had become. I hadn’t given it much attention lately, but today it struck me how these awards illustrate the evolution of software testing.

Since I started at Microsoft seven years ago, I’ve earned nine Ship-It awards — plaques given to recognize your contribution to releasing a product. The first Ship-It I “earned” was in 2006 for Microsoft Antigen, an antivirus solution for Exchange.

I put “earned” in quotes because I only worked on Antigen a couple of days before it was released. A few days into my new job, I found myself on a gambling cruise — with open bar! — to celebrate Antigen’s release. I was also asked to sign a comically over-sized version of the product DVD box. Three years later I received another Ship-It — and signed another over-sized box — for Antigen’s successor, “Forefront Protection 2010 for Exchange Server”.

Fast-forward to 2011, when I received my plaque for Microsoft Office 365. This time there was no DVD box to sign — because the product wasn’t released on DVD. It’s a cloud-based productivity suite featuring email and Office.

This got me thinking. In 2006, when we shipped Antigen, all the features we wanted to include had to be fully developed, and mostly bug-free, by the day the DVD was cut. After all, it would be another three years before we cut a new one. And it would be a terrible experience for a customer to install our DVD, only to then have to download and install a service pack to address an issue.

By 2011, however, “shipping” meant something much different. There was no DVD to cut. The product was “released” to the cloud. When we want to update Office 365, we patch it in the cloud ourselves without troubling the customer; the change simply “lights up” for them.

This means products no longer have to include all features on day one. If a low-priority feature isn’t quite ready, we can weigh the impact of delaying the release to include the feature, versus shipping immediately and patching it later. The same hold true if/when bugs are found.

What does this all mean for Test? In the past, it was imperative to meet a strict set of release criteria before cutting a DVD. For example, no more code churn, test-pass rates above 95%, code coverage above 65%, etc. Now that we can patch bugs quicker than ever, do these standards still hold? Have our jobs become easier?

We should be so lucky.

In fact, our jobs have become harder. You still don’t want to release a buggy product — customers would flock to your competitors, regardless of how quickly you fix the bugs. And you certainly don’t want to ship a product with security issues that compromise your customer’s data.

Furthermore, it turns out delivering to the cloud isn’t “free.” It’s hard work! Patching a bug might mean updating thousands of servers and keeping product versions in sync across all of them.

Testers now have a new set of problems. How often should you release? What set of bugs need to be fixed before shipping, and which ones can we fix after shipping? How do you know everything will work when it’s deployed to the cloud? If the deployment fails, how do you roll it back? If the deployment works, how do you know it continues to work — that servers aren’t crashing, slowing down, or being hacked? Who gets notified when a feature stops working in the middle of the night? (Spoiler alert: you do.)

I hope to explore some of these issues in upcoming articles. Unfortunately, I’m not sure when I’ll have time to do that. I’m on-call 24×7 for two weeks in December.

A quick coverage of Code Coverage

Testing is full of numbers:

  • How long will the test pass take?
  • What percentage of the features have you tested?
  • What is the automation test pass rate?
  • How confident are we that the failing tests are real product failures and not failures of the test system?
  • What is my raise going to be?

Code Coverage is just a number.  It tells us how much of the code has been exercised, and maybe verified, by our testing effort.  This is also sometimes called White Box testing since we look at the code in order to develop our test cases.  Management sometimes puts a high value on the code coverage number.  Whether they should or not is a discussion best left to each company.  There are multiple ways we can get code coverage numbers.  Here are three examples.

Block testing

Definition: Execute a contiguous block of code at least once

Block testing is the simplest first order method to obtain a code coverage number.  The strength is it’s quick.  The weakness is it’s not necessarily accurate.  Take a look at this code example:

bool IsInvalidTriangle(ushort a, ushort b, short c)
{
bool isInvalid;
if ((a + b <= c) || (b + c <= a) || (a + c <= b))
    {
        isInvalid = true;
    }
return isInvalid;
}

If we tested it with the values of a=1, b=2, and c=3; we would get a code coverage of about 80%.  Great, management says, SHIP IT!  Wait, you say, there is a weakness of block level testing.  Can you spot it?  The one test case only hits the first condition of the IF statement.  Block level testing will report the line as 100% covered, even though we did not verify the second and third conditions.  If one of the expressions was “<” instead of “<=” we would never catch the bug.

Condition testing

Definition: Make every sub-expression of a predicate statement evaluate to true and false at least once

This is one step better than block level testing since we validate each condition in a multiple condition statement.  The trick is to break any statement with multiple conditions to one condition per line, and then put a letter in front of each condition.  Here is an example:

void check_grammar_if_needed(const Buffer& buffer)
{
A:  if (enabled &&
B:      (buffer.cursor.line < 10) &&
C:      !buffer.is_read_only)
    {
        grammarcheck(buffer);
    }  
}

Our tests would be:

Test  enabled    value of ‘line’   is_read_only   Comment
1 False  N/A  N/A
2 True 11  N/A A is   now covered
3 True 9 True B is   now covered
4 True 9 False C is   now covered

Breaking the conditions into one per line doesn’t really help much here.  This trick will help if you have nested loops.  You can set up a table to help make sure each inner expression condition is tested with each outer expression condition.

Basis Path testing

Definition: Test C different entry-exit paths where C (Cyclomatic complexity) = number of conditional expressions + 1

Does the term “Cyclomatic complexity” bring back nightmares of college?  Most methods have one entry and one or two exits.  Basis Path testing is best applied when there are multiple exit points since you look at each exit path in order to determine your code coverage.  The steps you follow to find the basis paths (shortest path method):

  • Find shortest path from entry to exit
  • Return to algorithm entry point
  • Change next conditional expression or sub-expression to alternate outcome
  • Follow shortest path to exit point
  • Repeat until all basis paths defined

Here is an example:

A:  static int GetMaxDay(int month, int year)
    {
    int maxDay = 0;
B:       if (IsValidDate(month, 1, year))    {
C:         if (IsThirtyOneDayMonth(month))     {
    maxDay = 31;
    }
D:      else if (IsThirtyDayMonth(month))    {
    maxDay = 30;
    }
    else    {
    maxDay = 28;
E:          if (IsLeapYear(year))    {
    maxDay = 29;
        }
    }
    }
    return maxDay;
F:       }

Test cases:

Branch to flip  Shortest path out        Path Input
n/a B==false ABF 0, 0
B B==true,   C==true ABCF 1,1980
C B==true,   C==false, D==true ABCDF 4,1980
D B==true,   C==false, D==false, E==false ABCDEF 2,1981
E B==true,   C==false, D==false, E==true ABCDEF 2,1980

These are just three of the many different ways to calculate code coverage.  You can find these and more detailed in any decent book on testing computer software.  There are also some good references online.  Here is one from a fellow Expert Tester.  As with any tool, you the tester have a responsibility to know the benefits and weaknesses of the tools you use.

Thankfully, most compilers will produce these numbers for us. Code Coverage goals at Microsoft used to be around 65% code coverage using automation.  For V1 of OneNote, I was able to drive the team and get it up to 72%.  Not bad for driving automation for a V1 project.  With the move from boxed products to services, code coverage is getting less attention and we are now looking more into measuring feature and scenario coverage.  We’ll talk about that in a future blog.

Now, what will we tell The Powers That Be?

The key to unlock the tests is in the combination

In the last blog, Andrew Schiano discussed Equivalence Class (EQ) and Boundary Value Analysis (BVA) testing methodologies.  This blog will talk about how to extend those two ideas even further with Combinatorial Testing.

Combinatorial Testing is a form of model-based testing.  It chooses pairs or sets of inputs, out of all of the possible inputs, that will give you the best coverage with the least cost.  Fewer test cases while still finding bugs and giving high code coverage is a dream of us testers.  It is best applied when:

  • Parameters are directly interdependent
  • Parameters are semi-coupled
  • Parameter input is unordered

Let’s look at an example UI.  You have to test a character formatting dialog.  It allows you to pick between four fonts, two font styles, and three font effects.  A chart of the values looks like this:

Field Values
Font Arial, Calibri,Helvetica, BrushScript
Style Bold, Italic
Effects Strikethrough, Word Underline, Continuous Underline

For any selection of text, you can have only one Font, zero to two Styles, and zero to two effects.  Notice how for effects, you could select one pair, strikethrough and continuous underline, but you should not be able to pick another pair, word underline and continuous underline.  You could set up a matrix of tests and check each combination.  That’s old school.  You could reduce the number of cases by applying EQ.  That would be smarter. Your best bet, though, is to apply Combinatorial testing.

Combinatorial testing, also sometimes called Pairwise Testing, is just modeling the System Under Test (SUT).  You start taking unique interesting combinations of inputs, using them to test the system, and then feeding that information back into the model to help direct the next set of tests.  There is mathematics involved that take into account the coupling (interactions) of the inputs, which we won’t cover here.  Thankfully there are tools that help us choose which test cases to run.   I’ll cover the Pairwise Independent Combinatorial Testing (PICT) tool, available free from Microsoft.  Since it is a tool, it is only as good as the input you give it.

The steps to using PICT or any other combinatorial testing tool are:

  1. Analysis and feature decomposition
  2. Model parameter variables
  3. Input the parameters into the tool
  4. Run the tool
  5. Re-validate the output
  6. Modify the model
  7. Repeat steps 4-6

In our example above, the decomposition would look like this:

  • Font: Arial, Calibri, Helvetica, BrushScript
  • Bold: check, uncheck
  • Italic: check, uncheck
  • Strikethrough: check, uncheck
  • Underline: check, uncheck, word, continuous

You feed this data into the tool and it will output the number of tests you specify.  You need to validate the case before running them.  For example, the BrushScript font only allows Italic and Bold/Italic.  If the tool output the test case:

  • Font: BrushScript
  • Bold: check
  • Italic: uncheck
  • Strikethrough: check
  • Underline:  word

Being the awesome tester that you are, you would notice this is not valid.  Thankfully the PICT tool allows you to constrain invalid input combinations.  It also allows you to alias equivalent values.  So, you modify the model, not the outputted values.  In this case, you would add two line so the input file would now look something like this:

Font: Arial, Calibri, Helvetica, BrushScript

Bold: check, uncheck

Italic: check, uncheck

Strikethrough: check, uncheck

Underline: check, uncheck, word, continuous

IF [Font] = “BrushScript” AND [Italic] = “uncheck” THEN [Bold] <> “check”;

IF [Bold] = “uncheck” AND [Italic] = “uncheck” THEN NOT [Font] = “BrushScript”;

The PICT tool also allows you to weight values that are more common (who really uses BrushScript anymore?), and seed data about common historical failures.

Font: Arial(8) , Calibri(10), Helvetica(5), BrushScript (1)

Does this really work?  Here is an example from a previous Microsoft project:

Command line program with six optional arguments:

Total Number of   Blocks = 483
Default test suite
Exhaustive   coverage
Pairwise coverage
Number of test   cases
9
972
13
Blocks covered
358
370
370
Cove coverage
74%
77%
77%
Functions not   covered
15
15
15

Now, pair this with automation so you have data-driven automated testing, and you’re REALLY off and running as a twenty-first century tester!

A few words of caution.  While this gives you the minimum set of tests, you should also test:

  1. Default combinations
  2. Boundary conditions
  3. Values known to have caused bugs in the past
  4. Any mission-critical combinations.

Lastly, don’t forget Beizer’s Pesticide Paradox.  Keep producing new test cases.  If you only run the tool once, and continually run those same cases, you’re going to miss bugs.

%d bloggers like this: