The Application of Genetic Algorithms to Data Synthesis: A Comparison of Three Crossover Methods

Yingrui Chen, Mark Elliot, Duncan Smith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

282 Downloads (Pure)

Abstract

Data synthesis is a data confidentiality method which is applied to microdata to prevent leakage of sensitive information about respondents. Instead of publishing real data, data synthesis produces an artificial dataset that does not contain the real records of respondents. This, in particular, offers significant protection against reidentification attacks. However, effective data synthesis requires retention of the key statistical properties of (and respecting the multiple utilities of) the original data. In previous work, we demonstrated the value of matrix genetic algorithms in data synthesis [4]. The current paper compares three crossover methods within a matrix GA: parallelised (two-point) crossover, matrix crossover, and parametric uniform crossover. The crossover methods are
applied to three different datasets and are compared on the basis of how well they reproduce the relationships between variables in the original datasets.
Original languageEnglish
Title of host publicationPrivacy in Statistical databases
PublisherSpringer Nature
DOIs
Publication statusPublished - 2018

Keywords

  • Genetic algorithms
  • Data synthesis
  • Data privacy

Research Beacons, Institutes and Platforms

  • Cathie Marsh Institute

Fingerprint

Dive into the research topics of 'The Application of Genetic Algorithms to Data Synthesis: A Comparison of Three Crossover Methods'. Together they form a unique fingerprint.

Cite this