Re-identification in the absence of common variables for matching

Research output: Contribution to journalArticlepeer-review

Abstract

A basic concern in statistical disclosure limitation is the re-identification of individuals in anonymized microdata. Linking against a second dataset which contains identifying information can result in a breach of condentiality. Almost all linkage approaches are based on comparing the values of variables that are common to both datasets. It is tempting to think that if datasets contain no common variables, then there can be no risk of re-identification. However, linkage has been attempted between such datasets via the extraction of structural information using ordered weighted averaging (OWA) operators. Although this approach has been shown to perform better than randomly pairing records, it is debatable whether it demonstrates a practically significant disclosure risk.
This paper reviews some of the main aspects of statistical disclosure limitation. It then goes on to show that a relatively simple, supervised Bayesian approach can consistently outperform OWA linkage. Furthermore, the Bayesian approach demonstrates a significant risk of re-identification for the types of data considered in the OWA record linkage literature.
Original languageEnglish
JournalInternational Statistical Review
Early online date2 Dec 2019
DOIs
Publication statusPublished - 2019

Fingerprint

Dive into the research topics of 'Re-identification in the absence of common variables for matching'. Together they form a unique fingerprint.

Cite this