Big Data – Privacy doomsday or the next opportunity ?
Having worked with Hortonworks after the acquisition of XA Secure, I had the privilege to witness the big data revolution from inside out. Big data presents a new paradigm in data processing. Enterprises can store and process huge amounts of data with cost efficiency and scalability. With the amount of data available, enterprises can make business decisions based on better data analytics and predictive modeling.
Big data presents few challenges for privacy officers. Data can go through multiple stages of processing within the big data layer. The initial consent given to a particular data set by an individual user may no longer be relevant if the data is transformed and aggregated multiple cycles within the same infrastructure. The final data set used by analysts or data scientists may not resemble the original raw data. Consequently, data access needs also could vary across the data lifecycle. A raw data will have a different user access pattern compared to processed data which might be used by analysts or data scientists.
Much has been said about issue of privacy and big data. The electronic privacy information center (EPIC) has a full page focused on big data and privacy. The white house released a report on big data and privacy. The white house report walks through the impact of big data in various areas, from law enforcement to healthcare and how the privacy laws could be potentially applied to these areas.
While some privacy experts have long raised concerns on big data and privacy, many of the privacy concerns have existed even before the “big data” phase arrived. What has changed significantly are the tools and the infrastructure that now provide methods to understand data at very granular level, something the technologies in the past were not able to support. What it means is that data users could potentially use these data processing tools to analyze identity of individual with fewer attributes than it was possible before. Anonymization techniques to obfuscate personal data attributes are not enough to protect individual identity in the world of big data
While big data may seem like a scary proposition, there are tools and methods available that can be used to balance the needs for data processing with internal and external privacy rules. We recommend the following steps while embarking or in midst of big data project
- Understand your data. Data stewards should spend some time understanding data coming into big data infrastructure, the data sources and use cases that big data will be used.
- Classifying and tagging data. As part of understanding data, data stewards should classify data and apply tags to different data types.
- Work with your legal and privacy team on impact of privacy rules, the use cases and data types being considered.
- Build central policies(related to privacy rules) that can be understood by applications and databases
- Work with application team to manage policy enforcement. Tools such as Apache Ranger provide enforcement for Hadoop ecosystem and provide APIs that can be called from external policy management tool
- Monitor access patterns for system and human users and find anomalies against existing privacy rules.
- Enable feedback loop to correct any existing policies
Privacera’s privacy management tools can be easily implemented over big data ecosystem and can provide comprehensive method for consent and policy management, enforcement integration and continuous monitoring.
Keep a tab on this blog section for more upcoming details on consent management and privacy monitoring.