For any given organization to succeed, irrespective of its size, area of expertise and customer base, an awareness of the end user needs, and adapting to meet those is of utmost importance. In addition adapting to the changes promptly is equally important to achieve competitive advantage.
Test Data Management plays a key role in ensuring quality roll out of applications at the right time. As a service/practice, it has matured in most organizations. The predominant activities include sub-setting, masking, data refresh and synthetic creation standardized through Test Data Management (TDM), Extract Transform Load (ETL) or Database (DB) tools.
However, domain and platform specific test data provisioning still remains a challenge as this needs deep knowledge of the entire business processes. Data preparation for business events across domains is still in its infancy with organizations devising their own methods to address these challenges.
This paper summarizes the various approaches available for effectively preparing Test data sets for business events.
In any organization, approximately 40% of a tester’s time is spent on waiting for the right kind of test data. To add to it, 20% to 25% of defects found in UAT and production phase are directly or indirectly attributable to the quality of test data used.
Organizations have realized the impact of test data on the three pillars of Cost, Quality and Schedule. Hence, investments are being made in Centralizing Test Data as-a-Service (TDaaS) and automating test data provisioning through implementation of sophisticated Test Data Management (TDM) tools. These tools play a key role in:
- Extracting subsets of data from production
- Masking sensitive information
- Loading the masked – subsets into the Test environments
And all this needs to be achieved while retaining data integrity at an application level across applications.
Data integrity across applications is a critical requirement for testing end-to-end business scenarios. The TDM tools also have capabilities for generating synthetic data for new functionalities where data does not exist in production databases. Preparing test data for business events like End of Day (EoD) process in the banking sector or holiday season in the retail domain are complex scenarios where out-of-the-box solutions don’t exist. Various dimensions like volume of data, dependencies across applications and time sensitivity adds to the complexity of preparing data for business event testing.
Consider a scenario where the taxation rules for interest earned from bank deposits in US changes from 1-Jan-2015 and we are currently on 1-Jul-2014. In this case, the software changes may be developed in Sep-2014 and the modified programs may be implemented in production by 31-Oct-2014. However, the new tax calculation program will not be actually executed till say, the night of 31-Jan-2015. In order to test this scenario, we will have to simulate the data for 1-Jan-2015 to 31-Jan-2015 in the test environment.
A typical bank will have in excess of 20000 batch jobs for the month-end processing. Now, if a data model change impacting these batch jobs has to be implemented, it is imperative to test the entire month-end process. A copy of data from production cannot be used directly for testing as there will be at least two shortcomings:
- Production data will not have the Data model changes (Example: a new table or a modified column)
- Data in the production systems will be current but we need futuristic data for testing. For example: if a new tax rule for Interest has to be implemented from 1-Jan-2015 and the current date is 1-Jul-2014. We cannot use the production data as-is for Testing. We will have to artificially age some of the production data to beyond 1-Jan-2015.
Similarly, in the retail industry, let us take the example of a situation where a new piece of code needs to be implemented for calculating and applying discounts to purchases made between the discount period of 25-Nov-2014 and 5-Jan-2015.
This cannot be addressed completely by just advancing the date on the database and application servers. There will be many scenarios that will actually need data that spans before, during and after the discount period like:
- Prouct is sold prior to 25-Nov-2014 but returned during discount period
- Product is sodld and returned in the discount period
- Product is sold in the discount period and it is returned / exchanged after 5-Jan-2015
Hence, organizations have to carefully strategize their approach towards test data management, to ensure the quality of application and deliver excellent end user experience to safeguard their brand image.
Let us now focus on the approaches available for provisioning data for Business events like the ones mentioned above.
Scenario 1: Suppose a telecom service provider gets a new customer request and the Customer status is initially “New” and the status changes to "Active" only after 48 hrs.
This is a typical scenario where a small volume of data is required for testing the scenarios. For this we can use employee selective updates where the table columns that are impacted are first identified and then updated with values for the records that will be used in that particular test cycle.
So, in this case we can create a new customer from the application frontend with “New” status and then wait for 48 hours for it to change it to “Active” status. Only after that can they further test cases associated with this customer. Alternatively we can also update the “Customer creation Timestamp” column for the new customer to 48 hrs in the past and then run the “Customer Status Update” program. This will set the customer status to “Active”.
The latter approach is most commonly used by organization as it is quick and least disruptive. But the following factors need to be considered while employing this approach.
- Business rules can be implemented in the frontend application as well as through database programs. This demands complete knowledge of all the rules
- Often Date columns are updated and associated non-date columns are ignored. In our example, we have to ensure that the Status column is updated after the Customer Creation Timestamp column is updated
- Latest Data flow and inter-application dependency information is required in scenarios where updates have to be made in the right sequence across applications
Scenario 2: An organization wants to simulate entire data in their Test environment from 31-Jan-2015 while current date is 1-Jul-2014.
For this, data backup from production is loaded into the test environment and all date columns are advanced by a preset number of days.
The process that would be needed to be followed will include:
- Preparation of detailed Data flow diagram across applications
- Extracting production data for the last ‘n’ months (n can be 1, 3, 6 or 12, based on the project need) for all the applications as per the data flow
- As the current date in the scenario is 1-July-2014, we need to advance the dates by 7 calendar months to get data for 31-Dec-2015
- Data has to be loaded for applications in the test environment in the same order in which it was extracted
- Disable all Integrity constraints in the databases
- Identify all date columns and update all date columns by 7 months using the add months function of the database
- Identify all non-date dependent columns and update them if necessary
- Enable all Integrity constraints in the database
- Resolve issues identified while enabling constraints
- Document the issues encountered and the solution for future use
- Retain a backup of this prepared data before opening up the test environment to the Test teams