Feature Importance Ranking for Deep Learning

Maksymilian Wojtas, Ke Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Feature importance ranking has become a powerful tool for explainable AI. However, its nature of combinatorial optimization poses a great challenge for deep learning. In this paper, we propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size and ranking the importance of those features in the optimal subset simultaneously. During learning, the operator is trained for a supervised learning task via optimal feature subset candidates generated by the selector that learns predicting the learning performance of the operator working on different optimal subset candidates. We develop an alternate learning algorithm that trains two nets jointly and incorporates a stochastic local search procedure into learning to address the combinatorial optimization challenge. In deployment, the selector generates an optimal feature subset and ranks feature importance, while the operator makes predictions based on the optimal subset for test data. A thorough evaluation on synthetic, benchmark and real data sets suggests that our approach outperforms several state-of-the-art feature importance ranking and supervised feature selection methods.
Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 33
Publication statusAccepted/In press - 25 Sept 2020
Event34th Conference on Neural Information Processing Systems - Vancouver, Canada
Duration: 6 Dec 202012 Dec 2020

Publication series

NameAdvances in Neural Information Processing Systems
PublisherMorgan Kaufmann Publishers
ISSN (Print)1049-5258

Conference

Conference34th Conference on Neural Information Processing Systems
Abbreviated titleNeurIPS 2020
Country/TerritoryCanada
CityVancouver
Period6/12/2012/12/20

Fingerprint

Dive into the research topics of 'Feature Importance Ranking for Deep Learning'. Together they form a unique fingerprint.

Cite this