Abstract
The requirement to anonymise datasets that are to be released for secondary analysis should be balanced by the need to allow their analysis to provide efficient and consistent parameter estimates. The proposal in this paper is to integrate the process of anonymisation and data analysis. The first stage uses the addition of random noise with known distributional properties to some or all variables in a released (already pseudonymised) data set where the values of some identifying and sensitive variables for data subjects of interest are also available to an external ‘attacker’ who wishes to identify those data subjects so that they can interrogate their records in the dataset. The second, analysis, stage accounts for the noise addition in the data to provide required parameter estimates. Where the characteristics of the noise are made available to the analyst by the data provider , we propose a new method that allows a valid analysis. This is formally a measurement error model and we describe a Bayesian MCMC algorithm that recover consistent estimates of the true model parameters. A novel method for handling categorical data is presented. The paper shows how an appropriate noise distribution can be determined.
Original language | English |
---|---|
Pages (from-to) | 89–115 |
Journal | Journal of Official Statistics |
Volume | 36 |
Issue number | 1 |
DOIs | |
Publication status | Published - 17 Mar 2020 |
Keywords
- additive noise
- anonymisation
- measurement error
- record linkage