Quantifying Risk Factors in Medical Reports with a Context-Aware Linear Model

Piotr Przybyla, Austin Brockmeier, Sophia Ananiadou

Research output: Contribution to journalArticlepeer-review


OBJECTIVE: We seek to quantify the mortality risk associated with mentions of medical concepts in textual electronic health records (EHRs). Recognizing mentions of named entities of relevant types (eg, conditions, symptoms, laboratory tests or behaviors) in text is a well-researched task. However, determining the level of risk associated with them is partly dependent on the textual context in which they appear, which may describe severity, temporal aspects, quantity, etc. METHODS: To take into account that a given word appearing in the context of different risk factors (medical concepts) can make different contributions toward risk level, we propose a multitask approach, called context-aware linear modeling, which can be applied using appropriately regularized linear regression. To improve the performance for risk factors unseen in training data (eg, rare diseases), we take into account their distributional similarity to other concepts. RESULTS: The evaluation is based on a corpus of 531 reports from EHRs with 99 376 risk factors rated manually by experts. While context-aware linear modeling significantly outperforms single-task models, taking into account concept similarity further improves performance, reaching the level of human annotators' agreements. CONCLUSION: Our results show that automatic quantification of risk factors in EHRs can achieve performance comparable to human assessment, and taking into account the multitask structure of the problem and the ability to handle rare concepts is crucial for its accuracy.

Original languageEnglish
Pages (from-to)537-546
Number of pages10
JournalJournal of the American Medical Informatics Association
Issue number6
Early online date6 Mar 2019
Publication statusPublished - 1 Jun 2019


  • electronic health records
  • machine learning
  • multitask learning
  • natural language processing
  • risk assessment

Research Beacons, Institutes and Platforms

  • Manchester Institute of Biotechnology


Dive into the research topics of 'Quantifying Risk Factors in Medical Reports with a Context-Aware Linear Model'. Together they form a unique fingerprint.

Cite this