Efficient Feature Selection Using Shrinkage Estimators

Konstantinos Sechidis, Laura Azzimonti, Adam Pocock, Giorgio Corani, James Weatherall, Gavin Brown

Research output: Contribution to journalArticlepeer-review


Information theoretic feature selection methods quantify the importance
of each feature by estimating mutual information terms to capture: the
relevancy, the redundancy and the complementarity. These terms are commonly
estimated by maximum likelihood, while an under-explored area of research is how to use shrinkage methods instead. Our work suggests a novel shrinkage method for data-efficient estimation of information theoretic terms. The small sample behaviour makes it particularly suitable for estimation of discrete distributions with large number of categories (bins). Using our novel estimators we derive a framework for generating feature selection criteria that capture any high-order feature interaction for redundancy and complementarity. We perform a thorough empirical study across datasets from diverse sources and using various evaluation measures. Our first finding is that our shrinkage based methods achieve better results, while they keep the same computational cost as the simple maximum likelihood based methods. Furthermore, under our framework we derive efficient novel high-order criteria that outperform state-of-the-art methods in various tasks.
Original languageEnglish
JournalMachine Learning
Early online date9 May 2019
Publication statusE-pub ahead of print - 9 May 2019


  • Feature selection
  • High order feature selection
  • Mutual information
  • Shrinkage estimators


Dive into the research topics of 'Efficient Feature Selection Using Shrinkage Estimators'. Together they form a unique fingerprint.

Cite this