Data Governance and Security for Databricks

Databricks and Privacera leverage Apache Ranger to provide enterprise-grade data governance and security for machine learning and data science in the cloud.

Privacera and Databricks

Together, Databricks and Privacera enable enterprises to maximize the value of data by ensuring consistent governance, security, and compliance across all data science, machine learning, and artificial intelligence workloads.

Get a Complete View of Data and Comply with GDPR, CCPA

The seamless integration between Databricks and Privacera enables automatically scanning and profiling sensitive data created and accessed in Databricks. The data is tagged and classified to provide a comprehensive view of sensitive data assets, including PII, and to enable compliance with regulations such as GDPR and CCPA.

Enforce Fine-Grained Access Policies Across ML/AI Workloads

Privacera is based on Apache Ranger and enables column, row and file-level access control to data created and accessed via Databricks. Data platform teams can define role-based, fine-grained access management policies from a single pane of glass and enforce them across Spark SQL, ML/AI and other workloads in Databricks.

Make More Data Available for Analytics with Anonymization

Privacera enables various forms of data anonymization and masking of sensitive data in Databricks, which preserves data privacy while maintaining the data’s referential integrity and analytical value. Data scientists and others can use anonymized or masked data in Databricks for analytics and ML/ AI workloads while maintaining compliance with applicable regulations.

Scalable Deployment Model

Privacera uses a light-weight plug-in that runs natively inside Databricks environments. Unlike competing data governance and security solutions, Privacera never gets in between the user query and the flow of data, nor does it introduce a new layer into your tech stack, ensuring scalability and high performance.

Balance Data Governance and Security with Machine Learning and Artificial Intelligence

Together, Databricks and Privacera help data platform teams address three important challenges in order to balance data governance and security with machine learning and artificial intelligence.

Understand and Comply

The volume of data and pace of creation are exploding. Privacera enables data IT and data platform teams to cut through the complexity to understand what data is flowing into their Databricks environments, where the data is stored, and who is accessing it in order to comply with privacy mandates.

Control and Manage Access

Privacy and security mandates and other regulations mean IT and data platform teams need to ensure users only have access to authorized data. With Privacera, teams can implement fine-grained access controls down to the row, column and file-level, depending on use case.

Ensure privacy and Gain Insights

Privacera’s anonymization and pseudonymization capabilities mean IT and data platform teams can make sensitive data in Databricks available to data scientists and others for machine learning while both maintaining the data’s referential integrity and preserving privacy.

Privacera and Databricks Architecture

Privacera natively integrates with Databricks at the infrastructure level, as well as with Amazon S3, Azure Data Lake Store and other cloud storage services that make data available to Databricks, to provide consistent data governance and security.


Frequently asked questions

Does Privacera work with Databricks?

Privacera plugins, based on Apache Ranger, can enforce fine-grained access management in Databricks and Apache Spark. Privacera plugins are automatically initiated when a Databricks cluster is started.

Does Privacera access management add any performance overhead?

Privacera differs from other solutions that try to manage data requests from Apache Spark and access data on behalf of the service. Privacera’s lightweight access enforcement points quickly check a request and let it process if there is a corresponding policy granting access.

Is Privacera integrated with Apache Hive metadata store and AWS Glue?

Privacera works across any metadata store for Databricks, including Hive metadata stores and AWS Glue. Privacera can also enable tag-based access policies based on data classifications.

Resources & Latest News


Security and Privacy for Modern Data Platforms

Learn how to enable comprehensive security, privacy and governance in big data and cloud environments using Privacera.

Get Started Today

Contact us to learn more about Privacera for Databricks and get a free risk assessment.