
By Angsuman Dutta | Article Rating: |
|
July 30, 2017 04:00 PM EDT | Reads: |
658 |

Lessons from the GDPR Compliance Journey of a Leading Financial Services Organization
In preparation for General Data Protection Regulation (GDPR) compliance, a global 100 financial services organization embarked on a journey to assess its core information processing environments with the objective of identifying opportunities to strengthen its data privacy protection programs. This article focuses on the technology challenges, approach, and lessons learned for the centralized testing environment.
Situation
Like many DevOps groups across the industry, this financial organization has adopted both continuous testing and quality testing regimes to deliver quality products leveraging agile methodology. The organization prefers to use production data to prepare the test data. While majority testing is primarily being done by an internal team, certain applications are tested by outsourced offshore teams. The test environment is fairly complex comprising Oracle, Hadoop (Parquet files), Hive, Cassandra, MS SQL, SAS, and Linux-based systems. Incremental data volume varies between 10 million to 15 million records on weekly basis. Certain major releases of Big Data-based applications require up to 5 GB data ( ~ 75 million records).
Challenges
In order to comply with the GDPR and prevent data privacy breach events, the testing team needed to detect and de-identify the PII element. If they use available de-identification methods of leveraging product-specific encryption technology like MS SQL encryption, etc., much of the data becomes unusable for testing for the following reasons:
- Current methods scramble the data and make it unusable.
- Current methods do not preserve any referential relationship between various data sources.
If they choose to mask the data, they are confronted with similar challenges. For example, if they want to test an application that calculates the end-of-month summary balance of a customer account using an Oracle data source and Hadoop data source - they would not able to use the data encrypted using available technology.
In addition, PII information often appears within comments and description fields - encryption or masking of the entire field would result in the loss of important information.
More important, data encryption using available methods are computationally time-consuming and require large hardware infrastructure.
Approach
The organization identified the following solution criteria to mitigate the challenges identified during the assessment:
- Autonomous Detection: Leveraging a centralized library, a solution should examine all incoming data including embedded documents for the presence of PII elements. Solution should also be using machine learning techniques to classify sensitive documents present in a Big Data repository
- Format Preserving Encryption: Based on the type of PII data and preference of the user, the solution should encrypt the data elements in following three modes:
- Blind mode: It should encrypt data element if the data element matches a specific regular expression.
- Column mode: It should encrypt the content of a specific column or a field.
- Mixed Mode: It should encrypt the data elements within a specific column if the data element matches a specific regular expression
- Cross Platform Referential Integrity: Solution must be able to retain referential integrity between records across platforms
- Big Data Volume: Solution should be able to detect and encrypt sensitive data in 100 GB of data in less than one hour using commodity hardware.
- Data Usage Monitoring: Solution should be able to record and retain information for all data privacy usage for audit and compliance. In addition, the solution should be able to identify abnormal data usage leveraging machine learning.
Lessons Learned
- Understand business and technology landscape: It is imperative to understand the current technology landscape, business practices and emerging trends. If your technology platform and domain is monolithic today - do you expect it to remain monolithic in the near future? What would be the impact should you move some of your testings to a cloud platform? What about Big Data applications?
- Evaluate risks: Assess data security risks through the lens of GDPR and beyond. In addition to the PII and PHI information, most organizations deal with sensitive data that may not be associated with an individual. How to you detect, encrypt and monitor other types of sensitive data such as B2B contract information in your testing environment?
- Beyond Retrofitting: Define the ideal solution characteristics prior to evaluating solutions. Retrofitting a solution to meet your business needs is often time-consuming and costly.
Published July 30, 2017 Reads 658
Copyright © 2017 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Angsuman Dutta
Angsuman Dutta is the CEO and founder of Pricchaa, a Big Data security company. He is a Data Management and Analytics expert. He has helped numerous Fortune 500 enterprises with Big Data Adoption solutions primarily in Healthcare and Banking. Angsuman earned a degree in engineering from the IIT, and an MBA from the University of Chicago.
![]() Jul. 31, 2017 03:30 PM EDT Reads: 1,009 |
By Elizabeth White ![]() Jul. 31, 2017 03:15 PM EDT Reads: 374 |
By Pat Romanski ![]() Jul. 31, 2017 02:45 PM EDT Reads: 1,203 |
By Carmen Gonzalez ![]() Jul. 31, 2017 01:30 PM EDT Reads: 352 |
By Yeshim Deniz ![]() Jul. 31, 2017 01:15 PM EDT Reads: 482 |
By Yeshim Deniz ![]() Jul. 31, 2017 12:45 PM EDT Reads: 1,449 |
By Elizabeth White ![]() Jul. 31, 2017 12:45 PM EDT Reads: 862 |
By Yeshim Deniz ![]() Jul. 31, 2017 12:45 PM EDT Reads: 557 |
![]() Jul. 31, 2017 12:45 PM EDT Reads: 1,604 |
By Elizabeth White ![]() Jul. 31, 2017 12:15 PM EDT Reads: 900 |
By Yeshim Deniz ![]() Jul. 31, 2017 12:15 PM EDT Reads: 2,814 |
By Pat Romanski ![]() Jul. 31, 2017 12:00 PM EDT Reads: 1,542 |
By Pat Romanski ![]() Jul. 31, 2017 12:00 PM EDT Reads: 717 |
By Liz McMillan ![]() Jul. 31, 2017 11:38 AM EDT Reads: 325 |
By Liz McMillan ![]() Jul. 31, 2017 11:30 AM EDT Reads: 1,031 |
By Yeshim Deniz ![]() Jul. 31, 2017 11:15 AM EDT Reads: 2,539 |
By Yeshim Deniz ![]() Jul. 31, 2017 11:15 AM EDT Reads: 2,196 |
By Elizabeth White ![]() Jul. 31, 2017 10:15 AM EDT Reads: 1,552 |
By Elizabeth White ![]() Jul. 31, 2017 09:30 AM EDT Reads: 751 |
By Elizabeth White ![]() Jul. 31, 2017 09:30 AM EDT Reads: 838 |