As data volume and transactions continue to grow, every organization is concerned about data management and security. In the digital world, data is moving from on-premise to cloud at a faster pace to reduce operational cost and to increase scalability. The biggest concern is around personal data protection. There are regulations that exist for data protection like GDPR, PCI, HIPAA, etc. In addition to these, we need to protect/mask personal data in non-validated environments like training, development and testing. There are multiple tools and technologies available in the market for data masking/anonymization to avoid misuse and to be statutory compliant.
- The top implementation concerns or questions are: How to perform data anonymization/masking, and what are the tools and technologies that are available and apt?
- How to maintain data integration for the attributes that require masking?
- Will there be any impact on data availability SLAs?
- How to restrict data for certain users?
Effective data protection process:
A data protection process encrypts the data and removes personally identifiable information from data sets, so that the actual people whom the data reflects remain masked. Every organization sets its own methods to mask the data, based on the data types and data sources. Below are the best practices to follow for data in transit and data at rest.
When data in transit:
For data which is in movement from point to point either on premise or cloud, the original is replaced with dummy values. This is also called as dynamic data masking so that the end user can’t view the original data. Follow the process below for data in transit:
- Follow and monitor data governance process to access incoming data
- Select the appropriate tools and technologies which can provide persistent masking values for same or repetitive incoming personally identifiable information (PII) values while data is in move (Process depicted in Figure 1)
- Ensure persistent masking values for unique incoming PII values. That will take care of the data integrity between the entities/objects. For e.g. 455033112 will always be masked as 566144223
- Apply proper design principles, and optimize the masking process during ETL (Extract, Transfer and Load) to achieve the SLAs