A Modern Approach To Test Data Management
A Modern Approach To Test Data Management
A Modern Approach to
Test Data Management
Building a comprehensive solution to solve today’s
biggest application development challenges
Executive Summary
Speed is a critical business imperative for all organizations, The increasing pace of software development presents new
regardlessof industry. The pace at which enterprises can bring challenges. With the proliferation of DevOps, a heightened
new products and services to market determines their ability focus on automation, and requirements to secure data across
to differentiate fromcompetitors and retain market share. Now global teams of employees and contractors, IT organizations
more than ever, applications are at the center of this race. As must expand the charter of traditional TDM to meet the needs
enterprises look to deliver high-quality applications at the lowest of today’s development and testing teams. This white paper
possible cost, they need to build out a more agile application explores the top challenges that IT organizations face when
infrastructure—and that includes a robust and comprehensive managing test data, and highlights the top evaluative criteria to
test data management (TDM) strategy. Once viewed as a back- consider when implementing new technology solutions as part
office function, TDM is now a critical business enabler for enterprise of a TDM strategy.
agility, security, and cost efficiency.
• DATA DISTRIBUTION: reducing the time to operationalize test data with the same level of effortlessness. A streamlined TDM
approach must eliminate manual processes—for example,
• DATA QUALITY: fulfilling requirements for high-fidelity test data
target database initialization, configuration steps, and vali-
• DATA SECURITY: minimizing security risks without compro- dation checks—providing a low-touch approach to standing
mising agility up new data environments.
• INFRASTRUCTURE COSTS: lowering the costs of storing and • TOOLSET INTEGRATION: an efficient TDM approach unites
archiving test data the heterogeneous set of technologies that interact with
test datasets along the delivery pipeline, including masking,
The following sections highlight the top evaluative criteria in each
subsetting, and synthetic data creation. This requires both
of these four areas.
compatibility across tools and exposed APIs, or other clear
integration mechanisms. A factory-like approach to TDM that
Data Distribution
combines tools into a cohesive unit allows for greater levels of
Making a copy of production data available to a downstream automation and eliminates handoffs between different teams.
testing environment is often a time-consuming, labor-intensive • SELF SERVICE: by putting sufficient levels of automation and
process involving multiple handoffs between teams. The end- toolset integration in place, test data delivery can be executed
to-end process usually lags demand; at a typical IT organization, via self service, directly by end users. Instead of relying on IT
delivering a new copy of production data to a non-production ticketing systems, end users should take advantage of inter-
environment takes days, weeks, or months in some cases. faces purpose-built for their needs. Self-service capabilities
should extend not just to data delivery, but also to control
Organizations looking to improve TDM must build a solution that
over test data. For example, developers or testers should be
streamlines this process and creates a path towards fast, repeatable
able tobookmark and reset, archive, or share copies of test
data delivery. Specifically, test data managers should look for
data without involving operations teams.
solutions that feature:
A well-orchestrated approach to TDM has the potential to transform the overall application development process. Slashing the wait
time for data means testers can execute more test cases earlier in the software development lifecycle (SDLC), enabling them to
identify defects when they are easier and less expensive to fix.
Figure 1: Testing in a traditional scenario (A) vs. a scenario with an optimized TDM approach (B).
Data Quality
TDM teams go through great efforts to make the right types of A TDM approach should allow for multiple datasets to be
test data—such as masked production data or synthetic datasets— provisioned to the same point in time and simultaneously
available to software development teams. As TDM teams balance reset to quickly validate complicated end-to-end functional
requirements for different types of test data, they must also testing scenarios.
ensure data quality is preserved across three key dimensions: • DATA SIZE: due to storage constraints, developers must often
• DATA AGE: due to the time and effort required to prepare test work with subsets of data, which aren’t likely to satisfy all
data, operations teams are often unable to meet a number functional testing requirements. The use of subsets can result
of ticket requests. As a result, data often becomes stale in non- in missed test case outliers, which can paradoxically increase
production, which can impact the quality of testing and result rather than decrease project costs due to data-related errors.
in costly, late-stage errors. A TDM approach should aim to In an optimized strategy, full-size test data copies can be
reduce the time it takes to refresh from a gold copy, making provisioned in a fraction of the space of subsets by sharing
the latest test data more accessible. In addition, the latest common data blocks across copies. As a result, TDM teams can
production data should be readily available in minutes in the reduce the operational costs of subsetting—both in terms of
event that it is needed for triage. data preparation and error resolution—by reducing the need to
• DATA ACCURACY: a TDM process can become challenging subset data as frequently.
2. Once approved, a DBA creates a backup of production and then transfers and imports the backup copy to a staging server.
3. A test data manager validates the rowcount, any new PHI fields, and data structures.
4. The test data manager updates the subsetting artifacts, creates a subset, and executes the masking process.
5. The test data manager validates the row count, sends out an email update, and performs unit testing.
6. The DBA exports the masked data, transfers it to non-production, and updates the gold copy.
7. The DBA performs a backup and restore operation into the target Dev environment.
8. A second backup and restore process is performed to load data into a QA environment.
Figure 2: Example of a test data management process at a large health plan provider.
End-to-end, this process takes seven days. In an optimized process leveraging an integrated test data management platform, masked
data can be prepared in less than two days:
1. A test data platform automatically and non-disruptively remains in sync with production, providing continuous access to the latest
data and eliminating the need to perform a backup.
3. An admin profiles and automatically assigns repeatable masking algorithms. If required, an admin subsets the data beforehand.
4. After masking is complete, the admin tests changes and validates referential integrity with the ability to quickly rollback to the initial
unmasked state.
6. Instead of updating or replacing the existing gold copy, it remains as an archive in a centralized repository, where it is compressed
to a fraction of the size of production.
7. Developers access masked data via self service in minutes instead of performing a manual backup and restore process.
8. QA engineers branch their own copies of development and begin testing in minutes.
• SUBSETS OF PRODUCTION DATA are significantly more agile to human error and requires an in-depth understanding of data
than full copies. They can provide some savings on hardware, relationships both within the database schema or file system,
CPU, and licensing costs, but it can be difficult to achieve as well as those implicit in the data itself.
High consumption
Slow, manual Good test Sensitive data
Production Data of storage, CPU,
access coverage at risk
and licenses
Requires masking
Masked Data Extended SLAs Must ensure Improved data software or custom
(Full of Subset) for masked data referential integrity privacy and security scripting and staging
server
Case Study
One Fortune 500 financial services institution investigated the use of virtual data for the development and testing of an online platform
that provides market insights to clients and enables them to make smarter financial decisions. The investigation was triggered by
massive platform growth: over the span of a few years, financial data had doubled, usage had tripled, and feature development effort
had quadrupled. IT was struggling to keep pace with the exploding storage costs and missed several releases due to slow environment
provisioning. Moreover, a large percentage of bugs was discovered in late stage user-acceptance testing, which risked impacting
customer experience.
For both Oracle and MS SQL production data sources, the firm’s IT organization implemented a data virtualization technology with
built-in masking. After deploying the solution in less than a few weeks, the results were immediate. For instance, rather than waiting a full
day for a DBA team to restore an environment after a 20-minute test run, QA engineers leveraged secure virtual data to initiate a
10-minute reset process that brings the environment back to a bookmarked state. Less waiting enabled QA teams to execute more
test cycles earlier in the SDLC—a “shift left” in testing. Ultimately, this led QA teams to discover and resolve errors when they were
easier and less expensive to fix. The firm estimated that they reduced overall defect rates by 60 percent and improved productivity
across 800+ developers and testers by 25 percent. They also dramatically reduced storage requirements by almost 200 TB, enabling
them to accommodate massive platform growth without expanding their existing infrastructure.
1. “Transforming Test Data Management for Increased Business Value.” Cognizant 20-20 Insights, March 2013.
2. “Exploring Successful Approaches to Test Data Management”. Bloor Research, July 2012.
3. “Exploring Successful Approaches to Test Data Management”. Bloor Research, July 2012.
ABOUT DELPHIX
Delphix’s mission is to free companies from data friction and accelerate innovation. Fortune 100 companies use the Delphix Dynamic Data Platform
to connect, virtualize, secure and manage data in the cloud and in on-premise environments. For more information visit www.delphix.com.