Insights / Article

Stop testing with synthetic garbage: How to provision secure test data in minutes

Testing with synthetic data can miss critical issues, while production data introduces privacy risks. Here’s how to avoid both.

A man wearing glasses looks at a smartphone while green binary code is projected across his face, clothing, and the wall behind him, suggesting data, cybersecurity, or digital technology.

1 July 2026

•

6 min read

By Alex Platt

Whether you’re dealing with a bi-annual platform update or a fortnightly sprint release, modern delivery pipelines require speed and efficiency, and the pressure to validate, accept, and deploy is intense. This intensity can force dangerous compromises and assumptions when it comes to testing to meet tight deadlines.

To validate that a new release will work once deployed, you need data that accurately reflects your production environment. In pursuit of the demanding timeframes, delivery teams are regularly forced into a false dichotomy: either spending time that isn’t available manually synthesising data to enable thorough test coverage, or taking the easy route and cloning production data into lower test and development environments. Both options introduce critical failures, quietly undermining the quality of your release.

The trap of synthetic assumptions

Having teams manually synthesising test data creates the opportunity for testing bias.

When testers manually generate test data, they’re inherently creating data to meet specific outcomes they’re testing for. While teams strive to ensure their test scenarios provide broad coverage of possibilities, the reality of tight deadlines means thorough analysis of existing data is rarely feasible.

Real-world data is rarely perfect, and often chaotic. It’s unstructured, full of legacy anomalies, null values, duplication, and unpredictable user behaviour. When testing exclusively with synthetic data, the tests are based on our assumptions, instead of the reality of the data.

The outcome? Edge cases are missed, UAT passes perfectly, and the deployment fractures at go-live when the system encounters the chaotic reality of the production data.

The opposing trap of production clones

The obvious alternative is to clone the production data into the lower environments. It eliminates edge cases and bias, perfectly replicates the live environment, and gives business stakeholders the validation that if it works in test, it’ll work in production.

However, taking this route solves the quality problem but creates a potentially catastrophic security problem.

Lower environments rarely share the rigorous security infrastructure, strict access controls, and auditing of production environments. By cloning production data into these environments, your attack surface expands exponentially, exposing highly sensitive data to illicit activity, lapses in human judgement (particularly under stressful deadlines), or poor data handling.

Legislation exists across all environments

Test environments aren’t exempt from regulatory requirements. How an organisation handles personal data in a lower environment is subject to the same scrutiny by the Privacy Commissioner as its production systems. Information Privacy Principle 5 (IPP 5) of the Privacy Act 2020 explicitly states that an agency that holds personal information must ensure that the information is protected against loss, access, use, modification, or disclosure that isn’t authorised.

If your production environment is locked behind advanced firewalls and role-based access, but your UAT environment containing the same private data is fully accessible to third-party vendors or offshore developers, you’re at risk of systematically failing IPP 5.

Furthermore, IPP 10 “Limits on use of personal information” and IPP 11 “Limits on disclosure of personal information”, place strict limits on what personal information being held can be used for and the disclosure of personal information. Using unobfuscated personal information of customers, which was collected to provide services, to run test scenarios and share with third parties for development and testing, will generally fail these principles.

When convenience compromises security

The consequences of compromising on testing data are severe and wide-reaching, and the industry is full of examples.

In 2016, a development partner for the Australian Red Cross Blood Service used production data to back up a UAT environment for testing purposes. During the process, the database file containing data relating to approximately 550,000 prospective blood donors, including their contact details and answers to a blood donation eligibility questionnaire, was saved to a publicly accessible portion of the UAT server.

Through the compromise of using a production clone instead of obfuscated data to validate application quality, highly sensitive personal information was exposed to the public, resulting in immense reputational damage and intense regulatory scrutiny.

Additional examples of test environments containing production data being breached include:

Oxfam Australia (2021) – Oxfam’s media release
Telstra (2024) – Telstra’s release
Optus (2022) – Queensland Government case study

Secure production-like clone

Instead of choosing between biased synthetic test data and the regulatory risk of production data, the solution lies in the decoupling of the production data from its sensitivity. This is where automated data masking platforms like DataMasque come into play.

DataMasque sits directly over your Extract, Transform, Load (ETL) process, where instead of feeding production data directly into your lower environments, it securely anonymises your data automatically within the process. This results in completely anonymous production-like data landing in your test environments, providing you with confidence that your chaotic production data won’t expose cracks at go-live.

The key to a meaningful, production-like clone

The true value of an intelligently masked dataset is not just that it’s secure, but that it remains functionally identical to production. For thorough testing of complex IT systems, data architecture is key. DataMasque preserves referential integrity; primary and foreign keys (e.g. customer numbers) are masked deterministically, meaning complex table joins, historical relationships, and downstream BI reporting models remain intact across the different environments.

Additionally, it retains semantic consistency. Where a date of birth is fundamentally altered to protect the individual, the derived age remains mathematically valid, so the system can still test the business logic required, for example, an over-18 validation rule. Names are replaced with realistic alternatives, formal IDs like passport numbers retain standardised formatting, and financial figures are obfuscated while maintaining statistical distribution for accurate testing.

Prioritising confidence through data integrity

Effective quality assurance relies on robust, defensible practices. A sound QA strategy requires data that mirrors the complexities of production environments without exposing sensitive information. Relying on synthetic data often lacks the nuance required for thorough validation, while using raw production data introduces unacceptable security risks.

By adopting automated data masking for your QA governance, such as ETL pipelines, you can maintain data integrity while ensuring full compliance. This approach empowers your quality assurance teams to test against realistic, production-like datasets with confidence.

Transitioning to a secure, masked data strategy enables organisations to move beyond the traditional trade-off between speed and security, ensuring that release sign-offs are underpinned by genuine, verified certainty.

Want to learn how to safely leverage high-value data for AI acceleration?

Join us for an exclusive executive roundtable – hosted by Assurity and DataMasque, and featuring special guest Dawie Olivier, CITO at FirstCape. This closed-door session will explore how to unlock safe data access for AI testing and model evaluation without regulatory exposure.

Event details

Thursday, 27 August, 12:00 p.m. – 2:30 p.m. at Dockside Wellington
Request your seat
Wednesday, 9 September, 12:00 p.m. – 2:00 p.m. at the SO/ Auckland Hotel
Request your seat

Looking to secure your test data? Let’s build a secure test data framework tailored to your organisation. Contact our team today.

Beyond the implementation: Navigating the hard realities of Microsoft Business Central
The accuracy paradox: What modern sports teach us about human-centred AI in delivery
Test engineering and neurodiversity: Learning to aim the arrow

Insights

Guides

Events

Our story

Meet the team

Stop testing with synthetic garbage: How to provision secure test data in minutes

The trap of synthetic assumptions

The opposing trap of production clones

Legislation exists across all environments

When convenience compromises security

Secure production-like clone

The key to a meaningful, production-like clone

Prioritising confidence through data integrity

Related articles

Beyond the implementation: Navigating the hard realities of Microsoft Business Central

The accuracy paradox: What modern sports teach us about human-centred AI in delivery

Test engineering and neurodiversity: Learning to aim the arrow