Narrative
Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data efficiently without being overwhelmed. Standard keyword-based search systems treat documents as collections of unrelated words, do not consider their structure and meaning, and often return many irrelevant documents. Text mining analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying Text mining methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text.Mining the History of Medicine, a project led by the Manchester-based National Centre for Text Mining (NaCTeM) and the Centre for the History of Science, Technology and Medicine (CHSTM), demonstrated the potential of text mining methods to help researchers from multiple disciplines to discover and extract information automatically from medical historical archives.
The project has developed resources and tools to support sophisticated text mining applications: a unique temporal resource of medical terminology that records variation and semantic shift of medical concepts over the course of the 19th and 20th centuries, and customised tools that can extract terms, named entities, relations, and events from medical historical archives. These tools are now available as web services and as components of NaCTeM’s interoperable text mining environments (U-Compare and Argo). This will ensure that they can be reused and flexibly integrated with other tools, to create various types of applications that are suited to the needs of different researchers.
As a concrete application, the resources and tools have been used to create the History of Medicine (HOM) semantic search system. Through the integration of the temporal terminological resource, the system increases possibilities for researchers to broaden and deepen their work to ask ‘big’ questions that cover long periods, without losing sensitivity to changes in terminology and meaning. Historians of medicine have operated the HOM search system over two large-scale medical resources, the British Medical Journal (BMJ) and the London-area Medical Officer of Health (MOH) reports, and have demonstrated it is much easier to efficiently explore and search the contents of the two archives. The semantic search system allows direct retrieval of relevant literature from different periods and answers to their research questions to be quickly located.
The development of such tools opens up increasingly efficient and manageable ways to reveal, explore and discuss long-term, large-scale historical transformations related to medicine, health and British society. This has led to major initiatives such as Europe PubMed Central being able to lawfully text-mine full papers and increased levels of Text mining within such bodies as the British Library and institutional repositories.
Category of impact | Society and culture, Health and wellbeing, Awareness and understanding |
---|---|
Impact level | Benefit |
Research Beacons, Institutes and Platforms
- Digital Futures
- Institute for Data Science and AI
- Biotechnology
Documents & Links
Related content
-
Research output
-
Text mining the history of medicine
Research output: Contribution to journal › Article › peer-review
-
Customised OCR correction for historical medical text
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
-
Using text mining techniques to extract phenotypic information from the PhenoCHF corpus
Research output: Contribution to journal › Article › peer-review
-
Semantically enhanced search system for historical medical archives
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
-
A Cross-lingual Similarity Measure for Detecting Biomedical Term Translations
Research output: Contribution to journal › Article › peer-review
-
Comparable Study of Event Extraction in Newswire and Biomedical Domains
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
-
Annotating and Detecting Phenotypic Information for Chronic Obstructive Pulmonary Disease
Research output: Contribution to journal › Article › peer-review
-
Adaptable, High Recall, Event Extraction System with Minimal Configuration
Research output: Contribution to journal › Article › peer-review
-
Projects
-
Mining the History of Medicine.
Project: Research