Simulation and the Reality Gap: Moments in a Prehistory of Synthetic Data

James Steinhoff*, Sam Hind

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

12 Downloads (Pure)

Abstract

This paper sketches a prehistory of synthetic data in the development of simulation technologies. Synthetic data is connected to simulation by the technical problem of the reality gap: the gap between the synthetic data a model is trained on and the real-world data it is deployed on. The reality gap is presented as a novelty both generated and solved by synthetic data. We demonstrate that the reality gap has plagued simulation technologies since their inception in the mid-20th century. We contend that the reality gap is not something synthetic data can solve. To illustrate this, we examine three episodes in the prehistory of synthetic data. These episodes are representative of three distinct regimes of simulation: (a) the statistical regime, (b) the discrete-event regime, and (c) the visual-interactive regime. Each regime reveals a reality gap; from before the advent of digital computers to the present. Synthetic data, like simulations, require data about a given domain in order to model it. It requires the real-world data which it purports to dispense with. The reality gap is thus an epistemological issue as well as a technical one. We argue that it is also a political economic issue: it complicates existing means of producing data, adding new layers of mediation and labor. Synthetic data thus indicates the emergence of an alternative stack for the production of AI systems. This suggests that the political economy of AI must take account of the proliferation of new technical means for creating data.
Original languageEnglish
Pages (from-to)1-14
JournalBig Data & Society
Volume12
Issue number1
Early online date18 Mar 2025
DOIs
Publication statusPublished - 18 Apr 2025

Fingerprint

Dive into the research topics of 'Simulation and the Reality Gap: Moments in a Prehistory of Synthetic Data'. Together they form a unique fingerprint.

Cite this