Simulation and the Reality Gap: Moments in a Prehistory of Synthetic Data

James Steinhoff, Sam Hind

Research output: Working paperPreprint

Abstract

Synthetic data is cast by its proponents as the cure for nearly all problems associated with machine learning, from labour costs to privacy and bias. However, in generating synthetic data a fundamental technical issue is encountered: the “reality gap” or when machine learning models trained on synthetic data fail when deployed on conventional data. In the context of contemporary machine learning the reality gap is often described in terms of great novelty, as a phenomenon without historic comparison. This paper challenges the supposition that the reality gap is novel by articulating a “prehistory” of synthetic data in the development of simulation technologies. Simulation is one of primary methods for synthesizing data. This prehistory illustrates how the reality gap has plagued physicists, computer scientists, and engineers since even before the birth of modern computing. We examine three episodes, representative of three regimes of simulation: a) a statistical regime exemplified by the Monte Carlo method, b) a discrete-event regime exemplified by United Steel’s General Simulation Program and c) a visual-interactive regime, exemplified by Fortran-based graphical simulations. In all these episodes, data are presupposed to set up a simulation such that it can avoid the necessity of data collection. We suggest that as the ambitions of contemporary synthetic data producers increase, the data that will be required to get simulations going, or to close the reality gap, is likely to increase in quantity, detail, and granularity.
Original languageEnglish
PublisherMediArXiv
Pages1-23
Number of pages24
Publication statusSubmitted - 21 Mar 2024

Keywords

  • synthetic data
  • simulation
  • machine learning
  • reality gap
  • generalizability
  • model

Fingerprint

Dive into the research topics of 'Simulation and the Reality Gap: Moments in a Prehistory of Synthetic Data'. Together they form a unique fingerprint.

Cite this