Taming EHR data: Using semantic similarity to reduce dimensionality

Leila Kalankesh, James Weatherall, Thamer Ba-Dhfari, Iain Buchan, Andy Brass

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    Abstract

    Medical care data is a valuable resource that can be used for many purposes including managing and planning for future health needs as well as clinical research. However, the heterogeneity and complexity of medical data can be an obstacle in applying data mining techniques. Much of the potential value of this data therefore goes untapped. In this paper we have developed a methodology that reduces the dimensionality of primary care data, in order to make it more amenable to visualisation, mining and clustering. The methodology involves employing a combination of ontology-based semantic similarity and principal component analysis (PCA) to map the data into an appropriate and informative low dimensional space. Throughout the study, we had access to anonymised patient data from primary care in Salford, UK. The results of our application of this methodology show that diagnosis codes in primary care data can be used to map patients into an informative low dimensional space, which in turn provides the opportunity to support further data exploration and medical hypothesis formulation. © 2013 IMIA and IOS Press.
    Original languageEnglish
    Title of host publicationStudies in Health Technology and Informatics|Stud. Health Technol. Informatics
    Pages52-56
    Number of pages4
    Volume192
    DOIs
    Publication statusPublished - 2013
    Event14th World Congress on Medical and Health Informatics, MEDINFO 2013 - Copenhagen
    Duration: 1 Jul 2013 → …

    Conference

    Conference14th World Congress on Medical and Health Informatics, MEDINFO 2013
    CityCopenhagen
    Period1/07/13 → …

    Keywords

    • Data Mining
    • Electronic Health Records
    • Primary Health Care
    • Principal Component Analysis
    • Semantics

    Fingerprint

    Dive into the research topics of 'Taming EHR data: Using semantic similarity to reduce dimensionality'. Together they form a unique fingerprint.

    Cite this