Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.

Original languageEnglish
Title of host publicationPrivacy in Statistical Databases
Subtitle of host publicationInternational Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings
EditorsJosep Domingo-Ferrer, Maryline Laurent
Place of PublicationCham, Switzerland
PublisherSpringer Nature
Pages234-249
Number of pages16
ISBN (Electronic)9783031139451
ISBN (Print)9783031139444
DOIs
Publication statusPublished - 14 Sep 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13463 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • data synthesis
  • data utility
  • disclosure risk
  • microdata

Research Beacons, Institutes and Platforms

  • Cathie Marsh Institute

Fingerprint

Dive into the research topics of 'Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata'. Together they form a unique fingerprint.

Cite this