Abstract
We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed
shrinkage estimator.
shrinkage estimator.
Original language | English |
---|---|
Pages (from-to) | 1469-1484 |
Journal | Journal of Machine Learning Research |
Volume | 10 |
Publication status | Published - Jul 2009 |
Keywords
- entropy
- shrinkage estimation
- James-Stein estimator
- "small n, large p" setting
- mutual information
- gene association network