Human factors in research software engineering: an exploration of individuals, communities, and systemic challenges

  • Yochannah Yehudi

Student thesis: Doctor of Engineering

Abstract

Computers can facilitate research and smooth the handling of large datasets, to the degree that seventy percent of UK researchers believe it would be impractical to perform their work without it (Hettrick, 2018). Creating computer code is a socio-technical exercise, and “technical” code skills alone do not make a useful, reliable, and well-maintained computational research tool (Gunawardena et al., 2022). We investigate not the hardware or computer code used for research, but the people who create and use this computer code, from the angle of a computational scientist working in the areas of biological open source software engineering and open source community building. We conduct three qualitative and mixed-method studies, ranging from a focus on individual-level data users, and up to communities with tens of thousands of users or more. Study 1 investigates “subjective data models” - individual perceptions of biological data models. It concludes that computational biologists and non-computational wet-lab researchers do not hold different subjective data models. Study 2 looks at the challenges an individual researcher encounters around data access, sharing, and re-distribution for humanitarian research purposes in a global health crisis. Its results highlight the need for systemic reform in data sharing behaviours, staffing, and technical infrastructure. This is needed at all scales, whether looking at individual research institutes, funding bodies, national healthcare systems, or international governmental policy. Study 3 examines computer code creation at a community level, and the sustainability and health of open source communities. We discover that indicator systems can reveal events within an open source community, but the meaning of these events often needs additional context for correct interpretation. Throughout all studies, we investigate the human factors and outliers that may not typically be studied in computer science. A repeating theme in all three studies is that context is essential for correct interpretation of data. An individual might change their subjective data model based on the context their data are needed for (study 1); data sharing without metadata is common, but challenging to interpret when missing column meanings, calculation methods, and geopolitical context (study 2); and open source indicators of community health events can be interpreted multiple ways, unless additional context is available (study 3). This thesis uses a journal format to present all three studies.
Date of Award1 Aug 2024
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorCarole Goble (Supervisor) & Caroline Jay (Supervisor)

Keywords

  • covid-19
  • open community
  • open data
  • open source
  • online communities
  • open science

Cite this

'