Abstract
Feature selection is central to modern data science. The `stability' of a feature selection algorithm refers to the sensitivity of its choices to small changes in training data. This is, in effect, the robustness of the chosen features. This paper considers the estimation of stability when we expect strong pairwise correlations, otherwise known as feature redundancy. We demonstrate that existing measures are inappropriate here, as they systematically underestimate the true stability, giving an overly pessimistic view of a feature set. We propose a new statistical measure which overcomes this issue, and generalises previous work.
Original language | English |
---|---|
Title of host publication | European Conference on Machine Learning |
Publication status | Accepted/In press - 7 Jun 2019 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Wurzburg, Germany Duration: 16 Sept 2019 → 20 Sept 2019 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases |
---|---|
Abbreviated title | ECMLPKDD |
Country/Territory | Germany |
City | Wurzburg |
Period | 16/09/19 → 20/09/19 |