Partial Least Squares (PLS) is an important statistical technique with multipleand diverse applications, used as an effective regression method for correlated orcollinear datasets or for datasets that are not full rank for other reasons. A shorthistory of PLS is followed by a review of the publications where the issues with theapplication PLS that have been discussed. The theoretical basis of PLS is developedfrom the single value decomposition of the covariance, so that the strong links between principal components analysis and within the various PLS algorithms appear as a natural consequence.Latent variable selection by crossvalidation, permutation and information criteriaare examined. A method for plotting crossvalidation results is proposed that makeslatent variable selection less ambiguous than conventional plots. Novel and practicalmethods are proposed to extend published methods for latent variable selection byboth permutation and information criteria from univariate PLS1 models to PLS2 multivariate cases. The numerical method proposed for information criteria is also more general than the algebraic methods for PLS1 that have been recently published as it does not assume any particular form for the PLS regression coefficients. All of these methods have been critically assessed using a number of datasets, selected specifically to represent a diverse set of dimensions and covariance structures.Methods for simulating multivariate datasets were developed that allow controlof correlation and collinearity in both regressors and responses independently. Thisdevelopment also allows control over the variate distributions. Statistical design ofexperiments was used to generate plans for the simulation that allowed the factorsthat infuence PLS model fit and latent variable selection. It was found that all thelatent variable selection methods in the simulation tend to overfit and the feature inthe simulation that causes overfitting has been identified.
|Date of Award||1 Aug 2015|
- The University of Manchester
|Supervisor||Alexander Donev (Supervisor) & Michael Tso (Supervisor)|
- Partial Least Squares