Mining the History of Medicine: Semantically Enhanced Search System for Historical Medical Archives

Impact: Society and culture, Health and wellbeing, Awareness and understanding


Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data efficiently without being overwhelmed. Standard keyword-based search systems treat documents as collections of unrelated words, do not consider their structure and meaning, and often return many irrelevant documents. Text mining analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying Text mining methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text.

Mining the History of Medicine, a project led by the Manchester-based National Centre for Text Mining (NaCTeM) and the Centre for the History of Science, Technology and Medicine (CHSTM), demonstrated the potential of text mining methods to help researchers from multiple disciplines to discover and extract information automatically from medical historical archives.

The project has developed resources and tools to support sophisticated text mining applications: a unique temporal resource of medical terminology that records variation and semantic shift of medical concepts over the course of the 19th and 20th centuries, and customised tools that can extract terms, named entities, relations, and events from medical historical archives. These tools are now available as web services and as components of NaCTeM’s interoperable text mining environments (U-Compare and Argo). This will ensure that they can be reused and flexibly integrated with other tools, to create various types of applications that are suited to the needs of different researchers.

As a concrete application, the resources and tools have been used to create the History of Medicine (HOM) semantic search system. Through the integration of the temporal terminological resource, the system increases possibilities for researchers to broaden and deepen their work to ask ‘big’ questions that cover long periods, without losing sensitivity to changes in terminology and meaning. Historians of medicine have operated the HOM search system over two large-scale medical resources, the British Medical Journal (BMJ) and the London-area Medical Officer of Health (MOH) reports, and have demonstrated it is much easier to efficiently explore and search the contents of the two archives. The semantic search system allows direct retrieval of relevant literature from different periods and answers to their research questions to be quickly located.

The development of such tools opens up increasingly efficient and manageable ways to reveal, explore and discuss long-term, large-scale historical transformations related to medicine, health and British society. This has led to major initiatives such as Europe PubMed Central being able to lawfully text-mine full papers and increased levels of Text mining within such bodies as the British Library and institutional repositories. 
Category of impactSociety and culture, Health and wellbeing, Awareness and understanding
Impact levelBenefit

Research Beacons, Institutes and Platforms

  • Digital Futures
  • Institute for Data Science and AI
  • Biotechnology