Genome scale prediction of protein functional class from sequence using data mining

Ross D. King, Andreas Karwath, Amanda Clare, Luc Dehaspe

    Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

    Abstract

    The ability to predict protein function from amino acid sequence is a central research goal of molecular biology. Such a capability would greatly aid the biological interpretation of the genomic data and accelerate its medical exploitation. For the existing sequenced genomes function can be assigned to typically only between 40-60% of the genes [4,8,12,7]. The new science of functional genomics is dedicated to discovering the function of these genes, and to further detailing gene function [10,27,17,6]. Here we present a novel data-mining [24,18] approach to predicting protein functional class from sequence. We demonstrate the effectiveness of this approach on the Mycobacterium tuberculosis [8] genome. Biologically interpretable rules are identified that can predict protein function even in the absence of identifiable sequence homology. These rules predict 65% of the genes with no previous assigned function in Mycobacterium tuberculosis (the bacteria which causes TB) with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules give insight into the evolutionary history of the organism.
    Original languageEnglish
    Title of host publicationProceeding of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining|Proc. 6th ACM SIGKDD Intern. Conf. Knowl. Disco. Data Mining
    EditorsR. Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa
    Pages384-389
    Number of pages5
    DOIs
    Publication statusPublished - 2000
    EventProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001) - Boston, MA
    Duration: 1 Jul 2000 → …

    Conference

    ConferenceProceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001)
    CityBoston, MA
    Period1/07/00 → …

    Keywords

    • Biology and genetics
    • Concept learning
    • Data mining

    Fingerprint

    Dive into the research topics of 'Genome scale prediction of protein functional class from sequence using data mining'. Together they form a unique fingerprint.

    Cite this