Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search

Caroline Jay, Simon Harper, Ian Dunlop, Sam Smith, Shoaib Ahmed Sufi, Carole Goble, Iain Buchan

    Research output: Contribution to journalArticlepeer-review

    148 Downloads (Pure)


    Background: Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. Objective: The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Methods: Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Results: Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F1,19=37.3, P<.001), with a main effect of task (F3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F1,19=18.0, P<.001). There was also a main effect of task (F2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use
    Original languageEnglish
    Issue number1
    Publication statusPublished - 14 Jan 2016

    Research Beacons, Institutes and Platforms

    • Dementia@Manchester


    Dive into the research topics of 'Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search'. Together they form a unique fingerprint.

    Cite this