Information Extraction from Pharmaceutical Literature

Student thesis: Phd


With the constantly growing amount of biomedical literature, methods for automatically distilling information from unstructured data, collectively known as information extraction, have become indispensable. Whilst most biomedical information extraction efforts in the last decade have focussed on the identification of gene products and interactions between them, the biomedical text mining community has recently extended their scope to capture associations between biomedical and chemical entities with the aim of supporting applications in drug discovery. This thesis is the first comprehensive study focussing on information extraction from pharmaceutical chemistry literature. In this research, we describe our work on (1) recognising names of chemical compounds and drugs, facilitated by the incorporation of domain knowledge; (2) exploring different coreference resolution paradigms in order to recognise co-referring expressions given a full-text article; and (3) defining drug-target interactions as events and distilling them from pharmaceutical chemistry literature using event extraction methods.
Date of Award1 Aug 2014
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorSophia Ananiadou (Supervisor)


  • Event extraction
  • Chemical coreference resolution
  • Biomedical text mining
  • Information extraction
  • Chemical named entity recognition

Cite this