Saving Time and Costs for Evidence-based Public Health Interventions: Text Mining Tool RobotAnalyst

Impact: Health and wellbeing, Economic, Technological


Evidence-based public health (EBPH) reviews are central to public health policy, practice, and guidance. EBPH reviews require dynamic and multidimensional views of relevant information from the literature, without relying on a priori research questions. The large and growing number of published studies, therefore, makes the task of identifying relevant studies unbiasedly both complex and time-consuming. Since crucial information can be difficult to locate and understand given the complex nature of EBPH problems, the multiple causes and interrelations between interventions, diseases, populations, and outcomes can remain hidden. The global economic impact of preventable ill health (WHO) will continue to increase at an alarming rate and improved awareness of diseases at different levels: societal, financial, clinical, psychological, etc., are much needed. Thus, methods that provide cost-effective approaches to understanding interconnections between topics and better coverage of EBPH contribute towards mitigating the cost of public health.

To address these limitations, the Supporting Evidence-based Public Health Interventions using Text Mining project led by the National Centre for Text Mining (NaCTeM) at the University of Manchester combined text mining and machine learning to produce novel search methods while screening tools for public health reviews. Text mining methods can discover automatically knowledge from unstructured data and machine learning can support the prioritisation and ranking of the extracted information into meaningful topics. The combination of the two can minimise the impact of publication bias in reviews and extract more accurate and pertinent information from the literature, thus meeting policy and practice timescales. The increased cost efficiency contributes towards transforming EBPH and influencing the development of guidelines at a national and international level via NICE.

The project collaborated with Machine Learning and Data Analytics (MaLDA) at the University of Liverpool and the National Institute for Health and Care Excellence (NICE), and was funded by Medical Research Council (MRC)and Biotechnology and Biological Sciences Research Council (BBSRC). Follow-up funding has been awarded from the Alan Turing Institute (UK’s national institute for data science and artificial intelligence), National Institute for Health Research (NIHR), Engineering and Physical Sciences Research Council (EPSRC), and JISC.

One specific body of the work has been fundamental research into, and subsequent development of, a tool that can minimise the human workload involved in the study identification phase of systematic reviews. RobotAnalyst, as the culmination of the work, is a web-based software tool, which contains several research innovations, including document prioritisation, topic detection, and description document clustering, to improve the prioritisation accuracy of the screening process.

Combining text mining and machine learning algorithms for organising articles by their content, RobotAnalyst is equipped with active learning prioritisation that other text-mining tools for systematic reviews do not do. It substantially decreases human workload between 40% to 85% and the risk of bias, whilst increasing the consistency of findings. As of July 2020, Robot Analyst is used by over 200 teams, of which at least 80 different teams are non-academic, across 25 countries, mainly within sectors working in evidence-based medicine. It has to date helped the NICE, the Observatory Evidence Service (OES) at Public Health Wales and many other hospitals, national public health organisations, and policymakers to undertake systematic reviews and improve evidence-based decisions, cut costs and improve efficiency and robustness of key policy decisions. Using the GBP13,000 per review metric, this is currently benefitting clinical (non-academic) guideline activity valued in the region of GBP1,040,000. Moreover, given the national and international importance of EBPH reviewing, the project has developed a 'multistrand pathways to impact' document to engage with a variety of key EBPH stakeholders both in the UK and internationally.

This research also contributes to improving UK’s competitive position in a digital market through better language technology products and services. This project brings together a mixture of unsupervised techniques, which are reusable and re-targettable in supporting and enabling language technology-based access (via semantic search). Thus, these advanced search and screening techniques will be applicable in almost any other domain, such as energy, security, national libraries, and institutional repositories.
Impact date20162020
Category of impactHealth and wellbeing, Economic, Technological
Impact levelAdoption

Research Beacons, Institutes and Platforms

  • Biotechnology
  • Digital Futures
  • Institute for Data Science and AI
  • Manchester Institute of Biotechnology