Analysing the evolution of the NCI thesaurus

Rafael S. Gonçalves, Bijan Parsia, Uli Sattler

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    The National Cancer Institute (NCI) Thesaurus (NCIt) is a biomedical ontology which has been developed for over a decade. Nearly every month from 2003 through 2011, the NCI has published an updated version of the NCIt to the Web as an OWL ontology (as well as in other formats). We collected all 88 OWL versions of the NCIt available and conducted a cross-sectional study on this corpus to investigate and characterize the evolution of the NCIt. In particular, we gathered and analysed various axiom and entity statistics, and carried out a reasoner performance test over the corpus. Additionally, we extracted two complete sets of pairwise, consecutive diffs: the first set was generated by a purely syntactic difference analysis (based on OWL's notion of "structural equivalence"); for the second set, we also checked whether the additions or removals changed the set of entailments between versions. We discovered a high level of "merely syntactic" removals and additions. We develop a categorization of such changes based on a heuristic inference of the impact of the change. As a result, not only do we get a rich, purely analytic characterization of the change history of the NCIt, but also we generate a realistic test corpus for incremental classification. © 2011 IEEE.
    Original languageEnglish
    Title of host publicationProceedings - IEEE Symposium on Computer-Based Medical Systems|Proc. IEEE Symp. Comput.-Based Med. Syst.
    Pages1-6
    Number of pages6
    DOIs
    Publication statusPublished - 2011
    Event24th International Symposium on Computer-Based Medical Systems, CBMS 2011 - Bristol
    Duration: 1 Jul 2011 → …

    Conference

    Conference24th International Symposium on Computer-Based Medical Systems, CBMS 2011
    CityBristol
    Period1/07/11 → …

    Fingerprint

    Dive into the research topics of 'Analysing the evolution of the NCI thesaurus'. Together they form a unique fingerprint.

    Cite this