Conditional likelihood maximisation: A unifying framework for information theoretic feature selection

Gavin Brown, Adam Pocock, Zhao Ming-Jie, Mikel Luján

    Research output: Contribution to journalArticlepeer-review

    46 Downloads (Pure)

    Abstract

    We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature-instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples. © 2012 Gavin Brown, Adam Pocock, Ming-jie Zhao and Mikel Luján.
    Original languageEnglish
    Pages (from-to)27-66
    Number of pages40
    JournalJournal of Machine Learning Research
    Volume13
    Publication statusPublished - 8 Jan 2012

    Keywords

    • Conditional likelihood
    • Feature selection
    • Mutual information

    Fingerprint

    Dive into the research topics of 'Conditional likelihood maximisation: A unifying framework for information theoretic feature selection'. Together they form a unique fingerprint.

    Cite this