Test In Production – What is it all about?

I would like to share my thoughts about test-in-production (a.k.a TiP.) This term has become a buzz word in the testers wonderland as the industry is moving more towards providing solutions in the cloud. Here are 4 easy questions thru which I plan to address this.

I like to explain this with an analogy of a box product in the olden days vs cloud services today. In earlier days when software was shipped as a box product or a downloadable executable, testing was much simpler, in a way. Those box products have well-defined system requirements like Operating system (type, version), supported locale, disk space, RAM, yada yada yada. So when testers define the test plan its self-contained within those boundaries defined by the product. When the end-user buys the box product it is at his own decision on which hardware he can install the software. It is a balanced equation I guess, i.e, what’s tested to what’s installed, and works as expected = success if end-user chooses the hardware meeting the system requirements.

With the evolution of today’s cloud oriented solutions, customers want solutions that optimize cost (which is one of the reason cloud is evolving, in my opinion). The companies providing the software service decides on the hardware to suit the scale and performance need. In reality, not all software is custom-made to a h/w. So there are many variables that are associated to the h/w when it comes to testing software services in the cloud. For example, when you host your solution that is used by 100’s of 1000s’ of users you can think of 10’s of 100’s of servers in the data center.

The small software once tested in 1 machine or multiple machines (depending on what software architecture you are testing) now becomes a huge network tied up to various levels of Service Level Agreement (SLA) like performance, latency, scale, fault tolerance, security, blah blah blah.  Although it is very much possible to simulate the data center kind of setup within your corporate environment  there may/will be lot of difference when it comes to the actual setup in the data center. Some of these may include, but are not limited to, load balancers, active directory credentials, different security policy applied on the hosts, domain controller configurations specific to your hosting setup, storage access credentials; and these are just the tip of the iceberg.

What

So what is TiP? My definition for TiP is the end-to-end customer test scenario you can author with the required input parameters and target to run constantly in a predefined interval against the software end points of the hosted service. This validates the functionality and component integration, and provides a binary result: Pass or Fail. There are at least 2 different types of TiP tests you can author: Outside-In(OI) and Inside-Out(IO).

Outside-In(OI): These tests run outside your production environment targeting your software end point.

Inside-Out(IO): These tests run from within your data center targeting different roles you may have to ensure they are all functioning properly.

Why

TiP enables you to proactively find any issues before you could hear from a customer. Since the tests are running against your live site, it is expected to have appropriate monitoring built into the architecture so that the failures from these critical tests are escalated accordingly and appropriate action is taken. TiP is a valuable asset to validate your deployment and any plumbing between different software role* you may have in your architecture. TiP plays a critical role during service deployment or upgrade as it runs end-to-end tests on the production systems before it can go live to take the real-world traffic. Automated TiP scenario tests may save a lot of the testers from manually validating the functionality in production system.

When

TiP is recommended to be running all the time, for as long as you keep your s/w service alive.

How

I’m not going to go into any design in how. Rather it’s a high level thought. Identify from your test plan a few critical test paths that cover both happy path and negative test cases. Give priority to the test case that cover maximum code path and components. For example, if your service has replication, SQL transaction, flush policy, etc., encapsulate all of this into a single test case and try to automate the complex path. This will help ensure that the whole pipeline in your architecture is servicing as expected. There is no right or wrong tools for this. From batch files and shell scripts, to C# and Ruby on Rails, it’s up to you to find the right tool set and language appropriate for the task.

*role – An installation or instance of the operating system serving a specific capability. For example, an authentication system could be one instance of the OS in your deployment whose functionality is just to authenticate all the traffic to access your service.

Advertisements

7 Responses

  1. Thanks for sharing this. I agree that TIP is a useful approach to testing. There are many times when we have tested on a test environment, released to production and only then discovered a crucial different in environment or data which causes it to fail.

    I would be interested to know how you deal with test data in production? Do you have some way to hide it from real users?

    • Its a good question and can be written as a separate blog as well 🙂 There are diff ways you can manage the test data in production. The first option is what I use 1. You become a customer. Meaning you register/ subscribe like how customers sign up for your service. Use the data from your subscription as your test data in production. This is one of the safe way to do as you don’t have to worry about privacy as its your(probably your company’s) subscription/ account. One challenge in this approach is to channel your traffic thru a specific host if you have to verify specific things, but its certainly doable. As long as the service design/ architecture enables to do traffic configuration and management you do targeted debugging using your test data in a specific host in the production environment(assuming you have all the clearance to work in production network). This is particularly helpful because you will experience the same as the paying customers to the service (good, bad and ugly :)) 2. The second approach is to have predefined dataset. This is like designing data following some protocol/ standard that your service can handle specially e.g., prefix, suffix, tag some characters to represent unique test data and enable you to filter it from the real traffic. Challenge in this approach is the data as some real customer may choose how you defined your data(you could avoid this in the design, but customers can do what they wish to do and that’s why they pay for it). In my opinion you wouldn’t to hide the test data from production data unless you have any serious compliance factors. Why would you want to do? If real regular users see your test data then there may be a potential problem in the service. Data in the cloud should conform to privacy to the level and the nature of the service. If at all I want to hide the data the one place I can think of would be when I build any optics to look into the service performance or to evaluate COGS (Cost Of Goods Sold) of the pure paid traffic going thru my service. There are many ways if you really want to hide your data from production and treat your data as test data in production. Problem with this approach is that your software probably may be designed to treat test data diff than the real data. If that’s how its coded then it defeats the purpose of testing in production. > Date: Tue, 16 Apr 2013 09:34:53 +0000 > To: praksp@hotmail.com >

    • [Sorry about the same comment, for some reason in-line reply thru email did not work out well]

      Its a good question and can be written as a separate blog as well 🙂

      There are diff ways you can manage the test data in production. The first option is what I use

      1. You become a customer. Meaning you register/ subscribe like how customers sign up for your service. Use the data from your subscription as your test data in production. This is one of the safe way to do as you don’t have to worry about privacy as its your(probably your company’s) subscription/ account. One challenge in this approach is to channel your traffic thru a specific host if you have to verify specific things, but its certainly doable. As long as the service design/ architecture enables to do traffic configuration and management you do targeted debugging using your test data in a specific host in the production environment(assuming you have all the clearance to work in production network). This is particularly helpful because you will experience the same as the paying customers to the service (good, bad and ugly :))

      2. The second approach is to have predefined dataset. This is like designing data following some protocol/ standard that your service can handle specially e.g., prefix, suffix, tag some characters to represent unique test data and enable you to filter it from the real traffic. Challenge in this approach is the data as some real customer may choose how you defined your data(you could avoid this in the design, but customers can do what they wish to do and that’s why they pay for it).

      In my opinion you wouldn’t to hide the test data from production data unless you have any serious compliance factors. Why would you want to do? If real regular users see your test data then there may be a potential problem in the service. Data in the cloud should conform to privacy to the level and the nature of the service. If at all I want to hide the data the one place I can think of would be when I build any optics to look into the service performance or to evaluate COGS (Cost Of Goods Sold) of the pure paid traffic going thru my service.

      There are many ways if you really want to hide your data from production and treat your data as test data in production. Problem with this approach is that your software probably may be designed to treat test data diff than the real data. If that’s how its coded then it defeats the purpose of testing in production.

  2. this comment is a test.

  3. Ralph, you’re right! This was TiP in action! 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: