To Link or Synthesize? An Approach to Data Quality Comparison

Duncan Smith, Mark Elliot, Joseph W Sakshaug

Research output: Contribution to journalArticlepeer-review


Linking administrative data to produce more informative data for subsequent analysis has become an increasingly common practice.
However, there might be concomitant risks of disclosing sensitive information about individuals. One practice that reduces these risks
is data synthesis. In data synthesis the data are used to fit a model from which synthetic data are then generated. The synthetic data
are then released to end users. There are some scenarios where an end user might have the option of using linked data, or accepting
synthesized data. However, linkage and synthesis are susceptible to errors that could limit their usefulness. Here, we investigate the
problem of comparing the quality of linked data to synthesized data and demonstrate through simulations how the problem might be
approached. These comparisons are important when considering how an end user can be supplied with the highest quality data, and
in situations where one must consider risk / utility trade-offs.
Original languageEnglish
JournalJournal of Data and Information Quality
Publication statusPublished - 2023


Dive into the research topics of 'To Link or Synthesize? An Approach to Data Quality Comparison'. Together they form a unique fingerprint.

Cite this