Feature selection in omics prediction problems using CAT scores and false nondiscovery rate control

Miika Ahdesmäki, Korbinian Strimmer

    Research output: Contribution to journalArticlepeer-review

    Abstract

    We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted t-scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James–Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package “sda” available from the R repository CRAN.
    Original languageEnglish
    Pages (from-to)503-519
    Number of pages17
    JournalAnnals of Applied Statistics
    Volume4
    Issue number1
    DOIs
    Publication statusPublished - Mar 2010

    Keywords

    • Feature selection
    • linear discriminant analysis
    • correlation
    • James-Stein estimator
    • "small n, large p" setting
    • correlation-adjusted t-score
    • false discovery rates
    • higher criticism

    Fingerprint

    Dive into the research topics of 'Feature selection in omics prediction problems using CAT scores and false nondiscovery rate control'. Together they form a unique fingerprint.

    Cite this