System Architecture: Follow The Data

When I’m planning upcoming tasks for performance, scalability, or reliability testing, the first thing I do is learn the architecture of the system I’ll be working on. This helps me figure out the areas of the system that are most likely to fail.

How do I learn system architecture? I follow the data. Data has three states: it’s either at rest, in use, or in motion. Data at rest is stored in a database or on a file system and is infrequently used; data in use is stored in a database or on a file system and is frequently used; and data in motion is being transmitted between systems or stored in physical memory for reading and updating.

Here are some examples:

  • Data In Motion
    • A client application calling a web service.
    • A client mail transfer agent (MTA) sending an email message to a server MTA via the SMTP protocol.
    • One process calling a COM object in another process.
    • An email message being stored in RAM, so it can be scanned for viruses or spam.
    • A log being stored in RAM, so that it can be parsed.
    • Customer data being queried from a database and aggregated for presentation to the user.
  • Data In Use
    • Customer transactions stored in a database.
    • Program logs stored on disk.
  • Data At Rest
    • Archived databases

So how does this help in planning a testing effort? Usually after I learn and document a system architecture, the obvious weak areas identify themselves. Here are some example epiphanies:

  • “There will be 100 clients calling into that server’s web service…I wonder what the performance of that server will be? And I wonder what would happen if the service were unavailable?”
  • “That data is being stored in RAM during the transaction. How big can that data get? Will it exhaust the machine’s physical memory?”
  • “That data in RAM will be processed N times…how much CPU will that transaction take?”
  • “Those logs will be archived to the file share daily. How much data will be produced each day? Does that exceed the size of the file share?”

Following the data helps me quickly learn the architecture and plan the testing effort. What things do you do in order to learn system architecture?

Performance Testing 101

Hi all. In this post I’ll go over the general approach my team uses when planning and executing performance tests in the lab.

Step 1: define the questions we’d like to answer

I view software testing as answering questions. At the beginning of a test effort we document the questions we would like to answer, and then we spend the rest of the milestone answering them.

Questions for performance generally fall into three categories:

  • Resource utilization — how much CPU, disk, memory, and network does a system use?
  • Throughput — how many operations per second can a system handle?
  • Latency — how long does it take one operation to complete?

Here are some examples:

  • How many operations per second can a server handle?
  • How many concurrent clients can a server handle?
  • If a server handles load for 2 weeks straight, does throughput or latency degrade? Do we leak memory?
  • If we keep adding new customer accounts to a system, at what point will the system fall over? Which component will fall over first?
  • When a user opens a file that is 1 GB in size, how long does it take to open? How much disk activity occurs during this process?
  • When a process connects to a remote server, how much network bandwidth is used?

We spend a lot of time thinking about the questions, because these questions guide the rest of the process.

Step 2: define the performance tests

The next step is to define the performance tests that will help us answer our questions. For each performance test we identify two things: 1.) expected load and 2.) key performance indicators (KPIs).

Load is the set of operations that are expected to occur in the system. All of these operations compete for resources and affect the throughput and latency. Usually a system will have multiple types of load all occurring at the same time, and thus we try to simulate all of these types of load in a performance test.

A mistake I’ve made in the past is to not identify all types of important load. Sometimes I’ve focused too closely on one type, and forgot that there were other operations in the system that affected performance. The lesson I’ve learned: don’t test in a vaccuum.

The second part of a performance test is the key performance indicators (KPIs). These are the variables we want to measure, along with the goals for each variable. We always gather data for system resources (CPU, disk, memory, and network). We also gather data for application-specific KPIs in latency and throughput.

Step 3: automate and execute the tests

Now that the plans are complete, we focus on automation and execution. For each performance test we automate the load generation and data (KPI) collection.

Load generators are written in C#. With each load generator we attempt to mimic the expected load in the system. For example, if the load is SMTP email messages, we’ll write a generator that implements the client side of an SMTP session. If the load is SQL transactions, we’ll write functions that simulate these transactions.

Besides load, we also need to automate the collection of KPIs. This usually means collecting Windows performance counters. The .NET Framework has a PerformanceCounter class that makes collection easy.

Once things are automated, the next step is to execute the tests. Sometimes we run the tests manually only once or twice. Other times we schedule the tests to run automatically on a periodic basis. Each approach provides value in different ways and the choice depends on the team’s goals.

Step 4: report results

After tests are executed, we collect, analyze, and report results. We usually create a document that summarizes the major findings of the testing. We publish the document towards the end of the milestone.

Additionally, sometimes results are shared with folks throughout the milestone while testing is taking place. This can happen either manually, or via an automated system. For example, on our current project we are utilizing a web-based performance dashboard that a peer team created. The performance tests publish data to the dashboard automatically at the end of each run.

Performance Test Documents

After reading Andrew’s article on test plans I started thinking about my own experiences with writing test documents. In this article I’ll describe the different types of performance test documents my team creates.

Test Strategy Documents

The first type is high-level test strategy documents. One of my team’s responsibilities is to provide guidance on performance tools, techniques, infrastructure, and goals. We document this guidance and share it with our peer feature teams. These teams use the guidance when planning out specific test cases for their features.

Strategy documents provide value in a number of ways. They help my team gain clarity on strategy, they act as documentation that feature teams can use to learn about performance testing, and they assist us in obtaining sign-off from stakeholders.

I wrote one strategy document this year and it provided all of the above. I still occasionally refer to it when I’m asked about the performance testing strategy for our organization.

Test Plan Documents

Besides creating strategy documents, we also write more traditional test plan documents. These documents define a set of tests that we intend to execute for a project milestone. They include details of the tests, features that will be covered, the expected performance goals, and the hardware and infrastructure that we will use.

Similar to strategy documents, test plans help us gain clarity on a project and act as a springboard for stakeholder review. They seem to have a shorter shelf life though — I don’t find myself reviewing old performance test plans. My approach has been to “write it, review it, and then forget about it.”

Interestingly enough, I do find myself reviewing old test plans authored by other teams. Occasionally we need to write performance tests for a feature that we’re unfamiliar with. The first thing I do is review old functional test plans to understand how the feature works and what the feature team thought were the most important test scenarios. These test plans are invaluable in getting us ramped up quickly.

Result Reports

When my team completes a milestone I like to write a report that details the results and conclusions of the performance testing. These reports contain performance results, information about bugs found, general observations, and anything else I think might be useful to document. I send the final report to stakeholders to help them understand the results of testing.

One thing I really like about these reports is that they help me figure out which types of testing provided the most value. They also help me figure out how we can improve. When I start planning a new milestone, I first go through the old reports to get ideas.

Wrapping Up

Documentation isn’t always fun but I do find that it provides value for me, my team, and the organization. I’d like to pose a question to readers — what types of test documents do you create, and how do they provide value?

Thanks for reading!

– Rob

Where’s My Bottleneck?

Hello! My name is Rob and I’ll be contributing content to the Expert Testers blog. I’m excited for the opportunity to write about testing at Microsoft. Many thanks to Andrew for organizing this effort and setting things up.

I lead a team that focuses on performance and reliability testing. We spend a lot of time analyzing and diagnosing performance issues in our test labs and production datacenters. In future blog posts I’ll describe concrete examples of these investigations. Today I’ll outline the general steps we take in order to diagnose a performance issue.

An investigation usually starts with a question like this:

  • “I’m sending mail to an Exchange server and I expect to be able to send at least 200 msgs/s. At the moment the machine will accept only 100 msgs/s. Why is this happening?”

At a high-level our approach is to “measure first, then analyze”. We try not to jump to conclusions and instead make decisions based on data collected by various tools. The following three steps describe the process.

Step 1: reproduce the problem

The first step is to reproduce the problem. It’s helpful to have a simple repro of the situation that we can run as many times as necessary to identify the root cause of the issue. If we can create a repro in the test lab, that’s great and I consider us lucky. Sometimes we don’t have this luxury (it’s too difficult, would take too long, etc) and need to observe servers directly in  one of our production datacenters.

In either case, we try to answer these questions:

  • What are the steps to reproduce?
  • Which servers are involved?
  • What are the software and operating system versions?
  • What are the hardware configurations for the servers?
  • Does the issue occur at the same time every day?
  • Which data is involved?

The goal is to find a simple configuration and set of steps that allow for an easy repro.

Step 2: identify the bottleneck

Now that we (hopefully) have a reproducible test case, the next step is to identify the resource that is the bottleneck. A bottleneck will be either CPU, disk, memory, network, or an operating system entity like locks.

The Microsoft PFE Performance Guide is a great tutorial on finding a resource bottleneck. This guide is authored by Microsoft PFEs who use these steps while diagnosing issues in the field. My team can usually find the bottleneck quickly using Windows Performance Monitor and the these techniques.

Step 3: identify the root cause of the bottleneck

This is the tough part. Now that we know which resource is the bottleneck, we need to figure out why. Accomplishing this is different for each type of resource.   Here are some guidelines we follow for each resource:

  • CPU
  • Disk
  • Memory
    • Managed code? Use Windbg/SOS to analyze the process heap
    • Native code? Use DebugDiag (I haven’t tried it yet)
  • Network
    • Use ProcessExplorer to figure out which process is most chatty on the network
  • Locks
    • Managed code? Use SyncBlk in WinDbg to analyze lock information

(As I write this I get the feeling that the above might be good blog entries in the future.)

Besides the resource-specific strategies, there are some general things we also try to keep in mind:

  • Get symbols working. Symbols are invaluable.
  • Use the scientific method.
  • Don’t make guesses or jump to conclusions — measure, then analyze.

Wrapping Up

Thanks for reading! If you have any questions let me know.

– Rob T

%d bloggers like this: