DATPROF now supports Databricks
Test teams can now mask sensitive data directly inside Databricks Delta Tables: no exports, no workarounds, no waiting on engineering teams.
As more organizations adopt Lakehouse architectures, test data increasingly lives in Databricks: distributed across Delta Tables, layered through bronze, silver, and gold zones, with extensive version history. Until now, masking that data meant pulling it out of Databricks entirely.
DATPROF's new Databricks support changes that. You can now mask and generate test data directly where it lives, making secure testing in modern data platforms finally practical.
What you can do when you connect
DATPROF to Databricks
When you connect DATPROF to Databricks, you get:
- In-place masking on Delta Tables without moving data
- Synthetic data generation directly inside your lakehouse
- Conditional masking using SQL filters for precise control
- Automatic delta table version handling that prevents access to unmasked historical data
This brings test data provisioning directly into the Databricks environment, eliminating the complex dependency chains that previously slowed down testing cycles.
Why does this matter?
For organizations running test environments on Databricks, this integration fundamentally changes how test data provisioning works. Instead of treating the lakehouse as a black box that requires engineering intervention for every test cycle, test teams can now work directly with production-like data in a secure, compliant way. The result: faster test cycles, reduced compliance risk, and fewer dependencies blocking your testing pipeline.
1. Test data without the wait
Previously, getting masked test data from Databricks involved multiple teams and days of waiting: request an export, wait for engineering capacity, get approvals, build pipelines. Masking now runs in-place, cutting what used to take days down to minutes.
2. Delta history is finally secure
Delta Tables store historical versions, useful for time travel, but risky if old, unmasked values remain accessible. DATPROF's automated version handling ensures that no historical version contains unmasked sensitive data, eliminating a major compliance blind spot.
3. Consistency across your entire stack
In integration testing, the same customer masked differently in Oracle, SQL Server, and Databricks breaks everything. DATPROF's deterministic masking ensures values stay consistent across all platforms, making cross-system tests reliable and maintainable.
4. Work more independently
Test teams no longer need data engineers to create copies, views, or interim datasets for every test cycle. By masking directly in Databricks, test data specialists can work autonomously, reducing bottlenecks and speeding up delivery.
5. New testing scenarios
This capability unlocks workflows that were previously impractical:
- Mask data as it flows through bronze → silver → gold pipelines
- Provide safe test data for ML/AI pipelines in the lakehouse
- Include masking in Databricks Jobs or CI/CD workflows
- Generate synthetic data in the same environment where models train
- Use Delta time travel without privacy concerns
Getting Started
To start masking data in Databricks with DATPROF:
- Connect DATPROF to your Databricks workspace using your workspace URL and access token
- Select your Delta Tables from the DATPROF interface
- Configure your masking rules using DATPROF's built-in functions or custom logic
- Execute masking directly on your Delta Tables, no exports, no intermediate storage
DATPROF handles Delta Table versioning automatically, ensuring compliance across your entire data history.
For customers: find detailed setup instructions and best practices at our documentation or contact your DATPROF solutions engineer.
Not yet a customer? Schedule a product demonstration with one of our TDM experts.
Frequently asked questions
Can I mask sensitive data directly in Databricks?
Yes. With DATPROF, you can mask sensitive data directly inside Databricks Delta Tables, without exports, workarounds, or intermediate storage. This allows teams to use production-like test data safely within their existing Databricks environment while protecting sensitive values.
How does DATPROF support test data management in Databricks?
DATPROF brings test data management directly into Databricks. Teams can perform in-place masking, generate synthetic test data, and apply conditional masking rules using SQL filters for precise control. This removes the need to wait for exports or engineering capacity, helping teams speed up testing cycles.
Does DATPROF support Databricks Delta Table history?
Yes. DATPROF automatically handles Delta Table versioning, which is important because historical Delta Table versions may still contain unmasked sensitive data. By preventing access to unmasked historical versions, DATPROF helps reduce a major compliance risk.
Can I generate synthetic test data in Databricks?
Yes. In addition to in-place masking, DATPROF supports generating synthetic test data directly within the Databricks lakehouse environment. This is especially useful for teams that need safe test data for analytics, machine learning, integration testing, or development environments without exposing sensitive production data.
