A Leader’s Guide to Test Data Management (TDM)

As a leader at a tech organization, you know it's your responsibility to have a solid understanding of the software…

Testim
By Testim,

As a leader at a tech organization, you know it’s your responsibility to have a solid understanding of the software development life cycle. That includes, of course, software testing and everything it entails. If you sometimes feel overwhelmed, that’s completely understandable: This field’s lexicon is huge and continues to grow. For instance, you might be aware of the phrase “test data management” but not completely understand what it’s all about.

What Is Test Data Management in Software Testing?

Test data management (TDM) is the process of providing high-quality data for testing purposes. The TDM process is responsible for creating the data and ensuring that data has the expected quality and is readily available when the test processes need it in the expected amounts and formats.

How does TDM work? In practice, there are several steps involved in the process:

  • Data creation. The first step is generating the actual data. You can use different strategies such as production cloning—i.e., copying real data from production—or synthetic data generation—generating “fake” yet realistic data.
  • Data obfuscation. If a TDM process relies on production data, it’s important to identify and obfuscate sensitive data—i.e., passwords, tokens, financial information, and personal information—so it doesn’t leak. Strategies might include data masking and data anonymization.
  • Data subsetting. Also known as data slicing, this obtains only a portion of the data when performing production cloning. It makes sense: To test, you typically need some but not all of the available data. Data subsetting is valuable to reduce the storage and infrastructure costs associated with production cloning.
  • Provisioning. Once the data exists and is prepared for use, it needs to be delivered to the test environments. The TDM process must ensure test data is delivered at the right times and in suitable formats.
  • Integrations. It’s also essential that test data delivery is automatable and can integrate with the existing toolchain to be incorporated into the CI/CD pipeline.
  • Versioning. The versioning of the test data repositories allows for many benefits, such as perfect repeatability of tests and granular control of the changes made to the data.

Expand Your Test Coverage

Fast and flexible authoring of AI-powered end-to-end tests — built for scale.
Start Testing Free

Why Is Test Data Management Important?

There are several types of software testing that an organization can leverage in its test automation strategy. Some forms of testing either don’t require data, or the data they require can be incorporated into the test cases themselves. A classic example would be unit tests since their goal is to test each unit in complete isolation.

Other forms of testing do need data, such as end-to-end testing. And since these tests rely on data, you must ensure they get access to high-quality data. If your tests have to work with faulty or invalid data, you won’t be able to trust their results, regardless of the excellence of your QA strategy. In other words: garbage in, garbage out.

What Are the Properties That TDM Has to Ensure?

Great test data must have other properties besides quality. It doesn’t matter if you have high-quality data if it doesn’t get to your tests when they need it. So, availability is also essential for test data.

Realism is another crucial property of test data. You must ensure that test data mimics real production data as closely as possible. Otherwise, your test process won’t really verify how the system under test behaves in real-world scenarios.

Speaking of production data, organizations must ensure that customer’s data is protected at all costs. One of the most common solutions for obtaining test data is production cloning—i.e., literally copying the real data from the production servers (or at least portions of it). That tactic solves the problem of realism but creates the risk of exposing personally identifiable information. Besides the risk of tarnishing the organization’s reputation, such leakage can result in dire financial and legal consequences due to legislation such as GDPR and similar ones.

To sum it up, the TDM process is responsible for coming up with test data, ensuring its desired qualities, arriving where it needs to be when it needs to be there, and in the expected quantities and formats.

How Do You Implement TDM?

How do teams actually go about implementing test data management? Let’s see a quick three-step guide on how to do it.

1. Always Consider the Testing Pyramid

The testing pyramid is a mental framework that allows you to reason about the different types of software tests and understand how to prioritize between them.

In a nutshell, the testing pyramid states you should prioritize having a larger number of unit tests. Unit tests are typically cheaper to write and faster to run because they don’t rely on external dependencies. However, unit tests don’t resemble how a real user interacts with the application. That’s why you should also employ a smaller number of integration tests and UI or end-to-end tests. These forms of tests might be more cumbersome to write and, generally speaking, slower to run, but they offer a more realistic picture of the usage of the application.

2. Understand How TDM Contributes to the Overall ROI of Automated Testing

The testing pyramid is a valuable framework to help you decide how to allocate resources when it comes to the different types of software testing available.

The next step would be to understand at a deeper level the role TDM plays in calculating the overall ROI of your automated testing approach. In a post we published about the topic, you can learn that to calculate the ROI of your testing approach, you must factor in the costs associated with running all of the necessary testing environments.

Although that article doesn’t explicitly mention it, TDM is certainly an important part of setting up and managing test environments since…well, test environments need data. So, before getting started with a TDM strategy, an organization needs to analyze and calculate its ROI.

3: Pick a Tool

After an organization understands the importance of the testing pyramid and has analyzed the impact that TDM can have upon its automated testing ROI, it’s time to get started, which means tool shopping.

There are many TDM tools at your disposal, and they differ from each other in many aspects, including price, learning curve, and the resources they offer. Choosing the right TDM tool for an organization is an involved process. It’s important to understand the “style” of TDM that makes sense for your organization, particularly regarding the features you’ll need.

For instance, suppose you still haven’t released your application. As such, you don’t have any production data to copy from. That requirement narrows down your pool of available options to tools that have strong synthetic data generation capabilities.

Here Be Dragons: Main Pitfalls of Test Data Management

It’s not all a bed of roses when it comes to test data management. Quite the opposite: Pitfalls and challenges are common when trying to provision and manage test data. Here are some common ones you need to be aware of:

  • Full production cloning. Performing full production cloning—instead of leveraging data subsetting—can make the costs high.
  • Outdated test data. Relying on test data that are no longer valid or up to date is bound to result in poor outcomes.
  • Test data corruption. The same test data is often available to different test teams in the same environment, resulting in data corruption.
  • Safety and compliance. Sensitive data leakage is a danger, and the process of obfuscating it can be highly costly.
  • Storage and infrastructure costs. The organization might incur high infrastructure costs related to the generation and provisioning of test data, especially related to storage.

When picking a TDM tool, organizations must research how well the available tools handle the aforementioned pitfalls and challenges.

Manage Your Data to Scale Your Test Approach to the Next Level

A big challenge in automated testing is developing high-quality test data in a reliable, consistent, and automatic way. That’s where the discipline of test data management comes in handy. With a great TDM process in place, you can ensure your tests get the data they need when they need it.

There’s more to a proper TDM process than the benefits you’ve seen in this post, though. TDM, along with the nascent practice of TestOps, can act as a solid foundation for your testing approach when it comes time to scale it.

What to read next

The Top 5 Test Data Management Tools

Test Data Is Critical: How to Best Generate, Manage, and Use It