Preserving edits when perturbing microdata for statistical disclosure control

Natalie Shlomo, Ton De Waal

Research output: Contribution to journalArticlepeer-review

Abstract

To protect individuals in microdata from the disclosure risk of re-identification, a general perturbative method called PRAM (the Post-Randomization Method) is sometimes used for masking records. This method adds "noise" to categorical variables by changing values of categories for a small number of records according to a prescribed probability matrix and a stochastic process based on the outcome of a random multinomial draw. Changing values of categorical variables, however, will cause fully edited and logical records in microdata to start failing edit constraints (i.e., logical rules) resulting in data of low utility. Also, an inconsistent record will target the record as having been perturbed for disclosure control and attempts can be made to unmask the data. Therefore, the perturbation process must take into account per-record micro edit constraints through post-editing which will ensure that perturbed microdata satisfy all edits. In addition, file-level macro edit constraints, which take the form of information loss measures, are also defined in order to ensure that the overall utility of the data will not be badly compromised given an acceptable level of disclosure risk. This paper will discuss methods for perturbing microdata using PRAM while minimizing micro and macro edit failures. © 2005 -IOS Press and the authors. All rights reserved.
Original languageEnglish
Pages (from-to)173-185
Number of pages12
JournalStatistical Journal of the United Nations Economic Commission for Europe
Volume22
Issue number2
Publication statusPublished - 2005

Keywords

  • Disclosure risk
  • Imputation
  • Information loss
  • Microdata
  • Post-editing
  • Post-randomization method
  • Statistical disclosure control

Fingerprint

Dive into the research topics of 'Preserving edits when perturbing microdata for statistical disclosure control'. Together they form a unique fingerprint.

Cite this