Data privacy using an evolutionary algorithm for invariant PRAM matrices

Jordi Marés, Natalie Shlomo

Research output: Contribution to journalArticlepeer-review

Abstract

Dissemination of data with sensitive information has an implicit risk of unauthorized disclosure. Several masking methods have been developed in order to protect the data without the loss of too much information. One such method is the Post Randomization Method (PRAM) based on perturbations of a categorical variable according to a Markov probability transition matrix. The method has the drawback that it is difficult to find an optimal transition matrix to perform perturbations and maximize data utility. An evolutionary algorithm which generates an optimal probability transition matrix is proposed. Optimality is with respect to a pre-defined fitness function dependent on the aspects of the data that need to be preserved following perturbation. The algorithm embeds two properties: the invariance of the transition matrix to preserve marginal totals in expectation, and the control of diagonal probabilities which determine the amount of perturbation. Experimental results using a real data set are presented in order to illustrate and empirically evaluate the application of this algorithm. © 2014 Elsevier Ireland Ltd. All rights reserved.
Original languageEnglish
Pages (from-to)1-13
Number of pages12
JournalComputational Statistics and Data Analysis
Volume79
DOIs
Publication statusPublished - 2014

Keywords

  • Data utility
  • Disclosure risk
  • Fitness function
  • Genetic operators
  • Probability transition matrices

Fingerprint

Dive into the research topics of 'Data privacy using an evolutionary algorithm for invariant PRAM matrices'. Together they form a unique fingerprint.

Cite this