Abstract
Many organizations require detailed individual-level information, much of which has been collected under guarantees of confidentiality. However, simple anonymization procedures, i.e. removing names and addresses, are insufficient for this to be ensured. The records belonging to certain individuals have a high probability of being identified (as their contents, or attributes, are unusual) and therefore have the potential to be recognized spontaneously - such records are referred to as special uniques. Consider, for example, a sixteen-year-old widow in a population survey. Confidentiality of a given dataset cannot be enabled until all special unique records are identified and either disguised or removed. However, to the knowledge of the authors, no exhaustive automated analysis of this nature has been conducted due to the demanding levels of computation and data storage that are required. This paper introduces a new algorithm that locates 'Risky' records in discrete data by first identifying all unique attribute sets (up to a user-specified maximum size) and secondly by grading the 'Risk' of each record by considering the number and distribution of unique attribute sets within each record. Empirical tests indicate that the algorithm is highly effective at picking out 'Risky' records from large samples of data.
Original language | English |
---|---|
Pages (from-to) | 493-509 |
Number of pages | 16 |
Journal | International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems |
Volume | 10 |
Issue number | 5 |
DOIs | |
Publication status | Published - Oct 2002 |
Keywords
- Algorithms
- Classification
- Data mining
- Statistical disclosure
Fingerprint
Dive into the research topics of 'A computational algorithm for handling the special uniques problem'. Together they form a unique fingerprint.Impacts
-
Impact on the Statistical Confidentiality Practices of Data Stewardship Organisations
Elliot, M. (Participant), Purdam, K. (Participant), Mackey, E. (Participant), Smith, D. (Participant) & (Participant)
Impact: Economic impacts, Societal impacts, Legal impacts