Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases.
    Original languageEnglish
    Pages (from-to)1275-1283
    Number of pages8
    JournalBioinformatics
    Volume19
    Issue number10
    DOIs
    Publication statusPublished - 1 Jul 2003

    Keywords

    • Gene ontology

    Fingerprint

    Dive into the research topics of 'Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation'. Together they form a unique fingerprint.

    Cite this