Data security and privacy in cloud based analytics platform
While a cloud based solution has its own set of security and privacy challenges, data analytics also brings in the stringent aspects of data privacy. As data privacy deals with the personal/customer data, it enhances customer experience and anticipates an increase in top line/bottom line.
To address cloud security, the (privacy) controls are implemented at different layers such as infrastructure, network, application and data (at rest, in motion). However, with a cloud based analytics platform, we need to adhere to data privacy regulations and also provide the required outcome for the organization to make informed decisions.
Data privacy applies to the processing of personal data, namely any information relating to an identified or identifiable natural person. In the context of Big Data, the focus is more on indirect identification of personal data by following data privacy regulation, such as GDPR principles, notice, choice, consent, purpose of processing, privacy by design, etc.
Risk based proactive approach to data privacy
Organizations must take a risk based proactive approach to data privacy by creating data privacy standards and privacy control frameworks which can be applied consistently across all geographies and solution(s). This would minimize complexities and maximize data protection. Such a framework must provide guidance on what constitutes personal data, what are the requirements for personal data collection, process of managing consent, rules for accessing and using personal data, how to classify and protect personal data, implement the right set of processes and controls, based on the risk. Also, guidelines must be created in a way that integrates privacy as a key component from design to delivery of the products and services. In addition, privacy requirements into the initial phase of the software development life cycle, which will then decrease compliance risks and improve customer confidence in the product or services, will need to be incorporated.
Key design principles in data privacy in the Data Analytics platform
Data Minimization
Collects data that is necessary to provide a feature or service. Conduct a Privacy Impact Assessment to define the exact data processing needs thereby limiting data to what is essential.
Data Hiding
Hides personal data and its interrelationship from plain view. Leverages data anonymization solutions to anonymize the data at source.
Data Separation
Personal data will be processed in a distributed fashion in separate compartments whenever possible. The data separation controls will be implemented at the data center level, to meet the data privacy regulations in different countries.
Data Aggregation
Processes personal data at the highest level of aggregation, and with the least possible detail, yet is still useful.
Consent and Notice
Informs data subjects about the data being collected and takes consent from individuals at the time of data collection. Data with customer consent should be used for data processing in Aata Analytics.
Security Controls
Implements proper security controls such as firewalls, anti-virus, access controls, authorizations, audit logs, data encryption motion and rest, masking, anonymization, etc.
Ways to approach data privacy in Cloud Analytics platform
To secure sensitive data within the analytics ecosystem on the cloud, data protection is applied to sensitive data close to the source, i.e. up-stream application by integrating the application/job with Formation Preserving Encryption. The data flows in an encrypted format to the cloud and is made available for processing and analytical purposes without the risk of exposure. Applications/tools running on the cloud works on the encrypted data for reporting or analytical purposes.
Format Preserving Encryption (FPE) preserves the format of the sensitive data fields, while providing AES level of encryption strength. Here is a typical example of FPE:
Figure 1: Data Privacy in Cloud Analytics Platform
All the data, collected from different data sources, needs to be classified to create thePersonally Identified Information (PII) inventory and define which data elements need to be anonymized and de-identified with the appropriate risk level and consent details while capturing the data.
The data, before being sent to the Cloud Analytics platform, needs to be encrypted with de-identification techniques such as format preserving encryption, tokenization, and masking. It can then be sent to the Cloud Analytics platform for data mining and analysis over the encrypted communication mechanism. All data mining, correlation and reporting is performed by the analytics platform. For consumption of the data by other on premise applications, the data re-identification techniques need to be applied on need based.
While dealing with data privacy issues, the solution needs to address data security controls (data at rest and data in motion), cloud infrastructure security controls (Firewall, Anti-virus, IDS, IPS, DDOS, etc.), technical tools (for example, authentication and access control) and security assurance (penetration testing, vulnerability assessment, etc.). Along with technical security controls, the solution needs to address governance controls at process (for example, approvals for access) and people (for example, training, background checks, etc.).
This approach holistically addresses data privacy challenges while adhering to data privacy regulations, along with other security controls, making the security solution robust and scalable.
Srinivas Morishetti Cybersecurity & Risk Services- Srinivas Morishetti has 16 years of experience in the IT industry and 14 years in Cybersecurity & Risk Services. As the principal consultant, he is responsible for thought leadership, solution strategy and transformation in application security services. He is adept at handling risk and compliance, application, infrastructure, cloud and data security solutions, and service delivery management across industry verticals.
He can be reached at srinivas.morishetti@wipro.com