HYPHEN: A flexible, hybrid method to map phenotype concept mentions to terminological resources

Paul Thompson, Sophia Ananiadou

Research output: Contribution to journalArticlepeer-review

193 Downloads (Pure)


Narrative clinical records and biomedical articles constitute rich sources of information about phenotypes, i.e., markers distinguishing individuals with specific medical conditions from the general population. Phenotypes help clinicians to provide personalised treatments. However, locating information about them within huge document repositories is difficult, since each phenotypic concept can be mentioned in many ways. Normalisation methods automatically map divergent phrases to unique concepts in domain-specific terminologies, to allow location and linking of all mentions of a concept of interest. We have developed a hybrid normalisation method (HYPHEN) to handle concept mentions with wide ranging characteristics, across different text types. HYPHEN integrates various normalisation techniques that handle surface-level variations (e.g., differences in word order, word forms or acronyms/abbreviations) and lexical-level variations (where terms have similar meanings, but potentially unrelated forms). HYPHEN achieves robust performance for both biomedical academic text and narrative clinical records, and has the ability to significantly outperform related methods.
Original languageEnglish
Pages (from-to)91-121
Issue number1
Early online date31 May 2018
Publication statusPublished - 2018


Dive into the research topics of 'HYPHEN: A flexible, hybrid method to map phenotype concept mentions to terminological resources'. Together they form a unique fingerprint.

Cite this