Projects per year
Abstract
Teaching datasets are a pivotal component of the data discovery pipeline. These datasets often serve as the initial point of interaction for data users, allowing them to explore the contents of a dataset and assess its relevance to their needs. However, there are instances where their viability is limited, particularly where source data is only accessible within restricted settings, such as trusted research environments (TREs). In response to this challenge, this paper proposes the production of synthetic datasets tailored for specific teaching purposes by utilising already
cleared (and published) analyses as the basis for the synthesis. Unlike generic synthetic datasets, the datasets created are designed to solely reproduce the specific analyses. Crucially, the datasets can be generated without access to the original data. Two experiments with census data
demonstrate the viability of the method and a live use case is described. Issues arising such as marginal disclosure risk are then discussed
cleared (and published) analyses as the basis for the synthesis. Unlike generic synthetic datasets, the datasets created are designed to solely reproduce the specific analyses. Crucially, the datasets can be generated without access to the original data. Two experiments with census data
demonstrate the viability of the method and a live use case is described. Issues arising such as marginal disclosure risk are then discussed
Original language | English |
---|---|
Title of host publication | Privacy in Statistical Databases conference 2024 |
Publication status | Accepted/In press - 1 Jul 2024 |
Keywords
- Data Synthesis
- Evolutionary Algorithms
- Data Utility
- Disclosure Risk
Fingerprint
Dive into the research topics of 'The production of bespoke synthetic teaching datasets without access to the original data'. Together they form a unique fingerprint.Projects
- 1 Active
-
National Centre for Research Methods 2020-2024
Elliot, M. (PI) & Woodward, S. (CoI)
1/01/20 → 31/12/25
Project: Research