Do samples taken from a synthetic microdata population replicate the relationship between samples taken from an original population?

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Assessment of disclosure risk in sample surveys by data controllers who don’t have access to the population data are constrained by verifiability challenges. A sample unique may not be population uniques. Statistics generated at the sample level may not carry over to the population level. Privacy models such as k-anonymity simply may not make sense when applied to sample data (or only make sense for some scenarios) This study aims to understand whether samples generated from a synthetic population present the same relationship, in terms of risk and utility, to the synthetic population, as samples generated from the original population. Note that this is a very different question from the more general questions about the utility of synthetic data which compares the synthetic and original data. Here we are comparing two relationships. This opens the possibility of being able to test and set parameters for models of risk assessment to be applied to real data using synthetic data.
Original languageEnglish
Title of host publicationUNECE Expert Meeting on Statistical Data Confidentiality 2023, 26-28 September 2023, Wiesbaden
Publication statusAccepted/In press - 1 Sept 2023

Fingerprint

Dive into the research topics of 'Do samples taken from a synthetic microdata population replicate the relationship between samples taken from an original population?'. Together they form a unique fingerprint.

Cite this